summaryrefslogtreecommitdiffstats
path: root/net/sched
Commit message (Collapse)AuthorAgeFilesLines
* Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netPaolo Abeni2023-04-264-27/+31
|\ | | | | | | | | | | No conflicts. Signed-off-by: Paolo Abeni <pabeni@redhat.com>
| * net/sched: sch_qfq: refactor parsing of netlink parametersPedro Tammela2023-04-231-14/+11
| | | | | | | | | | | | | | | | | | | | Two parameters can be transformed into netlink policies and validated while parsing the netlink message. Reviewed-by: Simon Horman <simon.horman@corigine.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: Pedro Tammela <pctammela@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * net/sched: sch_qfq: use extack on errors messagesPedro Tammela2023-04-231-4/+5
| | | | | | | | | | | | | | | | | | | | Some error messages are still being printed to dmesg. Since extack is available, provide error messages there. Reviewed-by: Simon Horman <simon.horman@corigine.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: Pedro Tammela <pctammela@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * net/sched: sch_htb: use extack on errors messagesPedro Tammela2023-04-231-8/+9
| | | | | | | | | | | | | | | | | | | | Some error messages are still being printed to dmesg. Since extack is available, provide error messages there. Reviewed-by: Simon Horman <simon.horman@corigine.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: Pedro Tammela <pctammela@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * net/sched: cls_api: Initialize miss_cookie_node when action miss is not usedIvan Vecera2023-04-211-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Function tcf_exts_init_ex() sets exts->miss_cookie_node ptr only when use_action_miss is true so it assumes in other case that the field is set to NULL by the caller. If not then the field contains garbage and subsequent tcf_exts_destroy() call results in a crash. Ensure that the field .miss_cookie_node pointer is NULL when use_action_miss parameter is false to avoid this potential scenario. Fixes: 80cd22c35c90 ("net/sched: cls_api: Support hardware miss to tc action") Signed-off-by: Ivan Vecera <ivecera@redhat.com> Reviewed-by: Pedro Tammela <pctammela@mojatatu.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Link: https://lore.kernel.org/r/20230420183634.1139391-1-ivecera@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
| * net/sched: sch_fq: fix integer overflow of "credit"Davide Caratti2023-04-211-1/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | if sch_fq is configured with "initial quantum" having values greater than INT_MAX, the first assignment of "credit" does signed integer overflow to a very negative value. In this situation, the syzkaller script provided by Cristoph triggers the CPU soft-lockup warning even with few sockets. It's not an infinite loop, but "credit" wasn't probably meant to be minus 2Gb for each new flow. Capping "initial quantum" to INT_MAX proved to fix the issue. v2: validation of "initial quantum" is done in fq_policy, instead of open coding in fq_change() _ suggested by Jakub Kicinski Reported-by: Christoph Paasch <cpaasch@apple.com> Link: https://github.com/multipath-tcp/mptcp_net-next/issues/377 Fixes: afe4fd062416 ("pkt_sched: fq: Fair Queue packet scheduler") Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Davide Caratti <dcaratti@redhat.com> Link: https://lore.kernel.org/r/7b3a3c7e36d03068707a021760a194a8eb5ad41a.1682002300.git.dcaratti@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
* | net/sched: act_pedit: rate limit datapath messagesPedro Tammela2023-04-231-7/+5
| | | | | | | | | | | | | | | | | | | | | | | | Unbounded info messages in the pedit datapath can flood the printk ring buffer quite easily depending on the action created. As these messages are informational, usually printing some, not all, is enough to bring attention to the real issue. Reviewed-by: Jamal Hadi Salim <jhs@mojatatu.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: Pedro Tammela <pctammela@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | net/sched: act_pedit: remove extra check for key typePedro Tammela2023-04-231-22/+7
| | | | | | | | | | | | | | | | | | | | The netlink parsing already validates the key 'htype'. Remove the datapath check as it's redundant. Reviewed-by: Jamal Hadi Salim <jhs@mojatatu.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: Pedro Tammela <pctammela@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | net/sched: act_pedit: check static offsets a prioriPedro Tammela2023-04-231-6/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Static key offsets should always be on 32 bit boundaries. Validate them on create/update time for static offsets and move the datapath validation for runtime offsets only. iproute2 already errors out if a given offset and data size cannot be packed to a 32 bit boundary. This change will make sure users which create/update pedit instances directly via netlink also error out, instead of finding out when packets are traversing. Reviewed-by: Jamal Hadi Salim <jhs@mojatatu.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: Pedro Tammela <pctammela@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | net/sched: act_pedit: use extack in 'ex' parsing errorsPedro Tammela2023-04-231-4/+13
| | | | | | | | | | | | | | | | | | We have extack available when parsing 'ex' keys, so pass it to tcf_pedit_keys_ex_parse and add more detailed error messages. Reviewed-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: Pedro Tammela <pctammela@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | net/sched: act_pedit: use NLA_POLICY for parsing 'ex' keysPedro Tammela2023-04-231-8/+3
| | | | | | | | | | | | | | | | | | Transform two checks in the 'ex' key parsing into netlink policies removing extra if checks. Signed-off-by: Pedro Tammela <pctammela@mojatatu.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | net: sched: Print msecs when transmit queue time outYajun Deng2023-04-231-5/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The kernel will print several warnings in a short period of time when it stalls. Like this: First warning: [ 7100.097547] ------------[ cut here ]------------ [ 7100.097550] NETDEV WATCHDOG: eno2 (xxx): transmit queue 8 timed out [ 7100.097571] WARNING: CPU: 8 PID: 0 at net/sched/sch_generic.c:467 dev_watchdog+0x260/0x270 ... Second warning: [ 7147.756952] rcu: INFO: rcu_preempt self-detected stall on CPU [ 7147.756958] rcu: 24-....: (59999 ticks this GP) idle=546/1/0x400000000000000 softirq=367 3137/3673146 fqs=13844 [ 7147.756960] (t=60001 jiffies g=4322709 q=133381) [ 7147.756962] NMI backtrace for cpu 24 ... We calculate that the transmit queue start stall should occur before 7095s according to watchdog_timeo, the rcu start stall at 7087s. These two times are close together, it is difficult to confirm which happened first. To let users know the exact time the stall started, print msecs when the transmit queue time out. Signed-off-by: Yajun Deng <yajun.deng@linux.dev> Signed-off-by: David S. Miller <davem@davemloft.net>
* | Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski2023-04-202-6/+10
|\| | | | | | | | | | | | | | | | | | | | | Adjacent changes: net/mptcp/protocol.h 63740448a32e ("mptcp: fix accept vs worker race") 2a6a870e44dd ("mptcp: stops worker on unaccepted sockets at listener close") ddb1a072f858 ("mptcp: move first subflow allocation at mpc access time") Signed-off-by: Jakub Kicinski <kuba@kernel.org>
| * net/sched: clear actions pointer in miss cookie init failPedro Tammela2023-04-171-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Palash reports a UAF when using a modified version of syzkaller[1]. When 'tcf_exts_miss_cookie_base_alloc()' fails in 'tcf_exts_init_ex()' a call to 'tcf_exts_destroy()' is made to free up the tcf_exts resources. In flower, a call to '__fl_put()' when 'tcf_exts_init_ex()' fails is made; Then calling 'tcf_exts_destroy()', which triggers an UAF since the already freed tcf_exts action pointer is lingering in the struct. Before the offending patch, this was not an issue since there was no case where the tcf_exts action pointer could linger. Therefore, restore the old semantic by clearing the action pointer in case of a failure to initialize the miss_cookie. [1] https://github.com/cmu-pasta/linux-kernel-enriched-corpus v1->v2: Fix compilation on configs without tc actions (kernel test robot) Fixes: 80cd22c35c90 ("net/sched: cls_api: Support hardware miss to tc action") Reported-by: Palash Oswal <oswalpalash@gmail.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: Pedro Tammela <pctammela@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * net: sched: sch_qfq: prevent slab-out-of-bounds in qfq_activate_aggGwangun Jung2023-04-141-6/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If the TCA_QFQ_LMAX value is not offered through nlattr, lmax is determined by the MTU value of the network device. The MTU of the loopback device can be set up to 2^31-1. As a result, it is possible to have an lmax value that exceeds QFQ_MIN_LMAX. Due to the invalid lmax value, an index is generated that exceeds the QFQ_MAX_INDEX(=24) value, causing out-of-bounds read/write errors. The following reports a oob access: [ 84.582666] BUG: KASAN: slab-out-of-bounds in qfq_activate_agg.constprop.0 (net/sched/sch_qfq.c:1027 net/sched/sch_qfq.c:1060 net/sched/sch_qfq.c:1313) [ 84.583267] Read of size 4 at addr ffff88810f676948 by task ping/301 [ 84.583686] [ 84.583797] CPU: 3 PID: 301 Comm: ping Not tainted 6.3.0-rc5 #1 [ 84.584164] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014 [ 84.584644] Call Trace: [ 84.584787] <TASK> [ 84.584906] dump_stack_lvl (lib/dump_stack.c:107 (discriminator 1)) [ 84.585108] print_report (mm/kasan/report.c:320 mm/kasan/report.c:430) [ 84.585570] kasan_report (mm/kasan/report.c:538) [ 84.585988] qfq_activate_agg.constprop.0 (net/sched/sch_qfq.c:1027 net/sched/sch_qfq.c:1060 net/sched/sch_qfq.c:1313) [ 84.586599] qfq_enqueue (net/sched/sch_qfq.c:1255) [ 84.587607] dev_qdisc_enqueue (net/core/dev.c:3776) [ 84.587749] __dev_queue_xmit (./include/net/sch_generic.h:186 net/core/dev.c:3865 net/core/dev.c:4212) [ 84.588763] ip_finish_output2 (./include/net/neighbour.h:546 net/ipv4/ip_output.c:228) [ 84.589460] ip_output (net/ipv4/ip_output.c:430) [ 84.590132] ip_push_pending_frames (./include/net/dst.h:444 net/ipv4/ip_output.c:126 net/ipv4/ip_output.c:1586 net/ipv4/ip_output.c:1606) [ 84.590285] raw_sendmsg (net/ipv4/raw.c:649) [ 84.591960] sock_sendmsg (net/socket.c:724 net/socket.c:747) [ 84.592084] __sys_sendto (net/socket.c:2142) [ 84.593306] __x64_sys_sendto (net/socket.c:2150) [ 84.593779] do_syscall_64 (arch/x86/entry/common.c:50 arch/x86/entry/common.c:80) [ 84.593902] entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:120) [ 84.594070] RIP: 0033:0x7fe568032066 [ 84.594192] Code: 0e 0d 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b8 0f 1f 00 41 89 ca 64 8b 04 25 18 00 00 00 85 c09[ 84.594796] RSP: 002b:00007ffce388b4e8 EFLAGS: 00000246 ORIG_RAX: 000000000000002c Code starting with the faulting instruction =========================================== [ 84.595047] RAX: ffffffffffffffda RBX: 00007ffce388cc70 RCX: 00007fe568032066 [ 84.595281] RDX: 0000000000000040 RSI: 00005605fdad6d10 RDI: 0000000000000003 [ 84.595515] RBP: 00005605fdad6d10 R08: 00007ffce388eeec R09: 0000000000000010 [ 84.595749] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000040 [ 84.595984] R13: 00007ffce388cc30 R14: 00007ffce388b4f0 R15: 0000001d00000001 [ 84.596218] </TASK> [ 84.596295] [ 84.596351] Allocated by task 291: [ 84.596467] kasan_save_stack (mm/kasan/common.c:46) [ 84.596597] kasan_set_track (mm/kasan/common.c:52) [ 84.596725] __kasan_kmalloc (mm/kasan/common.c:384) [ 84.596852] __kmalloc_node (./include/linux/kasan.h:196 mm/slab_common.c:967 mm/slab_common.c:974) [ 84.596979] qdisc_alloc (./include/linux/slab.h:610 ./include/linux/slab.h:731 net/sched/sch_generic.c:938) [ 84.597100] qdisc_create (net/sched/sch_api.c:1244) [ 84.597222] tc_modify_qdisc (net/sched/sch_api.c:1680) [ 84.597357] rtnetlink_rcv_msg (net/core/rtnetlink.c:6174) [ 84.597495] netlink_rcv_skb (net/netlink/af_netlink.c:2574) [ 84.597627] netlink_unicast (net/netlink/af_netlink.c:1340 net/netlink/af_netlink.c:1365) [ 84.597759] netlink_sendmsg (net/netlink/af_netlink.c:1942) [ 84.597891] sock_sendmsg (net/socket.c:724 net/socket.c:747) [ 84.598016] ____sys_sendmsg (net/socket.c:2501) [ 84.598147] ___sys_sendmsg (net/socket.c:2557) [ 84.598275] __sys_sendmsg (./include/linux/file.h:31 net/socket.c:2586) [ 84.598399] do_syscall_64 (arch/x86/entry/common.c:50 arch/x86/entry/common.c:80) [ 84.598520] entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:120) [ 84.598688] [ 84.598744] The buggy address belongs to the object at ffff88810f674000 [ 84.598744] which belongs to the cache kmalloc-8k of size 8192 [ 84.599135] The buggy address is located 2664 bytes to the right of [ 84.599135] allocated 7904-byte region [ffff88810f674000, ffff88810f675ee0) [ 84.599544] [ 84.599598] The buggy address belongs to the physical page: [ 84.599777] page:00000000e638567f refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x10f670 [ 84.600074] head:00000000e638567f order:3 entire_mapcount:0 nr_pages_mapped:0 pincount:0 [ 84.600330] flags: 0x200000000010200(slab|head|node=0|zone=2) [ 84.600517] raw: 0200000000010200 ffff888100043180 dead000000000122 0000000000000000 [ 84.600764] raw: 0000000000000000 0000000080020002 00000001ffffffff 0000000000000000 [ 84.601009] page dumped because: kasan: bad access detected [ 84.601187] [ 84.601241] Memory state around the buggy address: [ 84.601396] ffff88810f676800: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc [ 84.601620] ffff88810f676880: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc [ 84.601845] >ffff88810f676900: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc [ 84.602069] ^ [ 84.602243] ffff88810f676980: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc [ 84.602468] ffff88810f676a00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc [ 84.602693] ================================================================== [ 84.602924] Disabling lock debugging due to kernel taint Fixes: 3015f3d2a3cd ("pkt_sched: enable QFQ to support TSO/GSO") Reported-by: Gwangun Jung <exsociety@gmail.com> Signed-off-by: Gwangun Jung <exsociety@gmail.com> Acked-by: Jamal Hadi Salim<jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | net: skbuff: hide csum_not_inet when CONFIG_IP_SCTP not setJakub Kicinski2023-04-191-2/+1
| | | | | | | | | | | | | | | | | | | | SCTP is not universally deployed, allow hiding its bit from the skb. Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* | net/sched: taprio: allow per-TC user input of FP adminStatusVladimir Oltean2023-04-131-13/+52
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is a duplication of the FP adminStatus logic introduced for tc-mqprio. Offloading is done through the tc_mqprio_qopt_offload structure embedded within tc_taprio_qopt_offload. So practically, if a device driver is written to treat the mqprio portion of taprio just like standalone mqprio, it gets unified handling of frame preemption. I would have reused more code with taprio, but this is mostly netlink attribute parsing, which is hard to transform into generic code without having something that stinks as a result. We have the same variables with the same semantics, just different nlattr type values (TCA_MQPRIO_TC_ENTRY=5 vs TCA_TAPRIO_ATTR_TC_ENTRY=12; TCA_MQPRIO_TC_ENTRY_FP=2 vs TCA_TAPRIO_TC_ENTRY_FP=3, etc) and consequently, different policies for the nest. Every time nla_parse_nested() is called, an on-stack table "tb" of nlattr pointers is allocated statically, up to the maximum understood nlattr type. That array size is hardcoded as a constant, but when transforming this into a common parsing function, it would become either a VLA (which the Linux kernel rightfully doesn't like) or a call to the allocator. Having FP adminStatus in tc-taprio can be seen as addressing the 802.1Q Annex S.3 "Scheduling and preemption used in combination, no HOLD/RELEASE" and S.4 "Scheduling and preemption used in combination with HOLD/RELEASE" use cases. HOLD and RELEASE events are emitted towards the underlying MAC Merge layer when the schedule hits a Set-And-Hold-MAC or a Set-And-Release-MAC gate operation. So within the tc-taprio UAPI space, one can distinguish between the 2 use cases by choosing whether to use the TC_TAPRIO_CMD_SET_AND_HOLD and TC_TAPRIO_CMD_SET_AND_RELEASE gate operations within the schedule, or just TC_TAPRIO_CMD_SET_GATES. A small part of the change is dedicated to refactoring the max_sdu nlattr parsing to put all logic under the "if" that tests for presence of that nlattr. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Ferenc Fejes <fejes@inf.elte.hu> Reviewed-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
* | net/sched: mqprio: allow per-TC user input of FP adminStatusVladimir Oltean2023-04-133-1/+143
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | IEEE 802.1Q-2018 clause 6.7.2 Frame preemption specifies that each packet priority can be assigned to a "frame preemption status" value of either "express" or "preemptible". Express priorities are transmitted by the local device through the eMAC, and preemptible priorities through the pMAC (the concepts of eMAC and pMAC come from the 802.3 MAC Merge layer). The FP adminStatus is defined per packet priority, but 802.1Q clause 12.30.1.1.1 framePreemptionAdminStatus also says that: | Priorities that all map to the same traffic class should be | constrained to use the same value of preemption status. It is impossible to ignore the cognitive dissonance in the standard here, because it practically means that the FP adminStatus only takes distinct values per traffic class, even though it is defined per priority. I can see no valid use case which is prevented by having the kernel take the FP adminStatus as input per traffic class (what we do here). In addition, this also enforces the above constraint by construction. User space network managers which wish to expose FP adminStatus per priority are free to do so; they must only observe the prio_tc_map of the netdev (which presumably is also under their control, when constructing the mqprio netlink attributes). The reason for configuring frame preemption as a property of the Qdisc layer is that the information about "preemptible TCs" is closest to the place which handles the num_tc and prio_tc_map of the netdev. If the UAPI would have been any other layer, it would be unclear what to do with the FP information when num_tc collapses to 0. A key assumption is that only mqprio/taprio change the num_tc and prio_tc_map of the netdev. Not sure if that's a great assumption to make. Having FP in tc-mqprio can be seen as an implementation of the use case defined in 802.1Q Annex S.2 "Preemption used in isolation". There will be a separate implementation of FP in tc-taprio, for the other use cases. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Ferenc Fejes <fejes@inf.elte.hu> Reviewed-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
* | net/sched: pass netlink extack to mqprio and taprio offloadVladimir Oltean2023-04-132-3/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | With the multiplexed ndo_setup_tc() model which lacks a first-class struct netlink_ext_ack * argument, the only way to pass the netlink extended ACK message down to the device driver is to embed it within the offload structure. Do this for struct tc_mqprio_qopt_offload and struct tc_taprio_qopt_offload. Since struct tc_taprio_qopt_offload also contains a tc_mqprio_qopt_offload structure, and since device drivers might effectively reuse their mqprio implementation for the mqprio portion of taprio, we make taprio set the extack in both offload structures to point at the same netlink extack message. In fact, the taprio handling is a bit more tricky, for 2 reasons. First is because the offload structure has a longer lifetime than the extack structure. The driver is supposed to populate the extack synchronously from ndo_setup_tc() and leave it alone afterwards. To not have any use-after-free surprises, we zero out the extack pointer when we leave taprio_enable_offload(). The second reason is because taprio does overwrite the extack message on ndo_setup_tc() error. We need to switch to the weak form of setting an extack message, which preserves a potential message set by the driver. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
* | net/sched: mqprio: add an extack message to mqprio_parse_opt()Vladimir Oltean2023-04-131-1/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Ferenc reports that a combination of poor iproute2 defaults and obscure cases where the kernel returns -EINVAL make it difficult to understand what is wrong with this command: $ ip link add veth0 numtxqueues 8 numrxqueues 8 type veth peer name veth1 $ tc qdisc add dev veth0 root mqprio num_tc 8 map 0 1 2 3 4 5 6 7 \ queues 1@0 1@1 1@2 1@3 1@4 1@5 1@6 1@7 RTNETLINK answers: Invalid argument Hopefully with this patch, the cause is clearer: Error: Device does not support hardware offload. The kernel was (and still is) rejecting this because iproute2 defaults to "hw 1" if this command line option is not specified. Link: https://lore.kernel.org/netdev/ede5e9a2f27bf83bfb86d3e8c4ca7b34093b99e2.camel@inf.elte.hu/ Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Ferenc Fejes <fejes@inf.elte.hu> Reviewed-by: Simon Horman <simon.horman@corigine.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
* | net/sched: mqprio: add extack to mqprio_parse_nlattr()Vladimir Oltean2023-04-131-7/+23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Netlink attribute parsing in mqprio is a minesweeper game, with many options having the possibility of being passed incorrectly and the user being none the wiser. Try to make errors less sour by giving user space some information regarding what went wrong. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Ferenc Fejes <fejes@inf.elte.hu> Reviewed-by: Simon Horman <simon.horman@corigine.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
* | net/sched: mqprio: simplify handling of nlattr portion of TCA_OPTIONSVladimir Oltean2023-04-131-19/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In commit 4e8b86c06269 ("mqprio: Introduce new hardware offload mode and shaper in mqprio"), the TCA_OPTIONS format of mqprio was extended to contain a fixed portion (of size NLA_ALIGN(sizeof struct tc_mqprio_qopt)) and a variable portion of other nlattrs (in the TCA_MQPRIO_* type space) following immediately afterwards. In commit feb2cf3dcfb9 ("net/sched: mqprio: refactor nlattr parsing to a separate function"), we've moved the nlattr handling to a smaller function, but yet, a small parse_attr() still remains, and the larger mqprio_parse_nlattr() still does not have access to the beginning, and the length, of the TCA_OPTIONS region containing these other nlattrs. In a future change, the mqprio qdisc will need to iterate through this nlattr region to discover other attributes, so eliminate parse_attr() and add 2 variables in mqprio_parse_nlattr() which hold the beginning and the length of the nlattr range. We avoid the need to memset when nlattr_opt_len has insufficient length by pre-initializing the table "tb". Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Ferenc Fejes <fejes@inf.elte.hu> Reviewed-by: Simon Horman <simon.horman@corigine.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
* | net/sched: sch_mqprio: use netlink payload helpersPedro Tammela2023-04-051-4/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | For the sake of readability, use the netlink payload helpers from the 'nla_get_*()' family to parse the attributes. tdc results: 1..5 ok 1 9903 - Add mqprio Qdisc to multi-queue device (8 queues) ok 2 453a - Delete nonexistent mqprio Qdisc ok 3 5292 - Delete mqprio Qdisc twice ok 4 45a9 - Add mqprio Qdisc to single-queue device ok 5 2ba9 - Show mqprio class Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: Pedro Tammela <pctammela@mojatatu.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Link: https://lore.kernel.org/r/20230404203449.1627033-1-pctammela@mojatatu.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
* | net/sched: act_tunnel_key: add support for "don't fragment"Davide Caratti2023-03-301-0/+5
| | | | | | | | | | | | | | | | | | | | extend "act_tunnel_key" to allow specifying TUNNEL_DONT_FRAGMENT. Suggested-by: Ilya Maximets <i.maximets@ovn.org> Reviewed-by: Pedro Tammela <pctammela@mojatatu.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: Davide Caratti <dcaratti@redhat.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
* | fix typos in net/sched/* filesTaichi Nishimura2023-03-243-3/+3
| | | | | | | | | | | | | | | | This patch fixes typos in net/sched/* files. Signed-off-by: Taichi Nishimura <awkrail01@gmail.com> Reviewed-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* | net/sched: act_api: use the correct TCA_ACT attributes in dumpPedro Tammela2023-03-231-4/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | 4 places in the act api code are using 'TCA_' definitions where they should be using 'TCA_ACT_', which is confusing for the reader, although functionally they are equivalent. Cc: Hangbin Liu <haliu@redhat.com> Reviewed-by: Jamal Hadi Salim <jhs@mojatatu.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: Pedro Tammela <pctammela@mojatatu.com> Acked-by: Hangbin Liu <haliu@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | net/sched: remove two skb_mac_header() usesEric Dumazet2023-03-222-2/+2
| | | | | | | | | | | | | | | | | | tcf_mirred_act() and tcf_mpls_act() can use skb_network_offset() instead of relying on skb_mac_header(). Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
* | sch_cake: do not use skb_mac_header() in cake_overhead()Eric Dumazet2023-03-221-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | We want to remove our use of skb_mac_header() in tx paths, eg remove skb_reset_mac_header() from __dev_queue_xmit(). Idea is that ndo_start_xmit() can get the mac header simply looking at skb->data. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
* | Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski2023-03-171-4/+4
|\| | | | | | | | | | | | | | | | | | | | | | | | | net/wireless/nl80211.c b27f07c50a73 ("wifi: nl80211: fix puncturing bitmap policy") cbbaf2bb829b ("wifi: nl80211: add a command to enable/disable HW timestamping") https://lore.kernel.org/all/20230314105421.3608efae@canb.auug.org.au tools/testing/selftests/net/Makefile 62199e3f1658 ("selftests: net: Add VXLAN MDB test") 13715acf8ab5 ("selftest: Add test for bind() conflicts.") Signed-off-by: Jakub Kicinski <kuba@kernel.org>
| * net/sched: act_api: add specific EXT_WARN_MSG for tc actionHangbin Liu2023-03-161-4/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | In my previous commit 0349b8779cc9 ("sched: add new attr TCA_EXT_WARN_MSG to report tc extact message") I didn't notice the tc action use different enum with filter. So we can't use TCA_EXT_WARN_MSG directly for tc action. Let's add a TCA_ROOT_EXT_WARN_MSG for tc action specifically and put this param before going to the TCA_ACT_TAB nest. Fixes: 0349b8779cc9 ("sched: add new attr TCA_EXT_WARN_MSG to report tc extact message") Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
| * Revert "net/sched: act_api: move TCA_EXT_WARN_MSG to the correct hierarchy"Hangbin Liu2023-03-161-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | This reverts commit 923b2e30dc9cd05931da0f64e2e23d040865c035. This is not a correct fix as TCA_EXT_WARN_MSG is not a hierarchy to TCA_ACT_TAB. I didn't notice the TC actions use different enum when adding TCA_EXT_WARN_MSG. To fix the difference I will add a new WARN enum in TCA_ROOT_MAX as Jamal suggested. Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
* | net: sched: remove qdisc_watchdog->last_expiresEric Dumazet2023-03-091-2/+4
|/ | | | | | | | | This field mirrors hrtimer softexpires, we can instead use the existing helpers. Signed-off-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/20230308182648.1150762-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
* net/sched: flower: fix fl_change() error recovery pathEric Dumazet2023-03-011-4/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The two "goto errout;" paths in fl_change() became wrong after cited commit. Indeed we only must not call __fl_put() until the net pointer has been set in tcf_exts_init_ex() This is a minimal fix. We might in the future validate TCA_FLOWER_FLAGS before we allocate @fnew. BUG: KASAN: null-ptr-deref in instrument_atomic_read include/linux/instrumented.h:72 [inline] BUG: KASAN: null-ptr-deref in atomic_read include/linux/atomic/atomic-instrumented.h:27 [inline] BUG: KASAN: null-ptr-deref in refcount_read include/linux/refcount.h:147 [inline] BUG: KASAN: null-ptr-deref in __refcount_add_not_zero include/linux/refcount.h:152 [inline] BUG: KASAN: null-ptr-deref in __refcount_inc_not_zero include/linux/refcount.h:227 [inline] BUG: KASAN: null-ptr-deref in refcount_inc_not_zero include/linux/refcount.h:245 [inline] BUG: KASAN: null-ptr-deref in maybe_get_net include/net/net_namespace.h:269 [inline] BUG: KASAN: null-ptr-deref in tcf_exts_get_net include/net/pkt_cls.h:260 [inline] BUG: KASAN: null-ptr-deref in __fl_put net/sched/cls_flower.c:513 [inline] BUG: KASAN: null-ptr-deref in __fl_put+0x13e/0x3b0 net/sched/cls_flower.c:508 Read of size 4 at addr 000000000000014c by task syz-executor548/5082 CPU: 0 PID: 5082 Comm: syz-executor548 Not tainted 6.2.0-syzkaller-05251-g5b7c4cabbb65 #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/21/2023 Call Trace: <TASK> __dump_stack lib/dump_stack.c:88 [inline] dump_stack_lvl+0xd9/0x150 lib/dump_stack.c:106 print_report mm/kasan/report.c:420 [inline] kasan_report+0xec/0x130 mm/kasan/report.c:517 check_region_inline mm/kasan/generic.c:183 [inline] kasan_check_range+0x141/0x190 mm/kasan/generic.c:189 instrument_atomic_read include/linux/instrumented.h:72 [inline] atomic_read include/linux/atomic/atomic-instrumented.h:27 [inline] refcount_read include/linux/refcount.h:147 [inline] __refcount_add_not_zero include/linux/refcount.h:152 [inline] __refcount_inc_not_zero include/linux/refcount.h:227 [inline] refcount_inc_not_zero include/linux/refcount.h:245 [inline] maybe_get_net include/net/net_namespace.h:269 [inline] tcf_exts_get_net include/net/pkt_cls.h:260 [inline] __fl_put net/sched/cls_flower.c:513 [inline] __fl_put+0x13e/0x3b0 net/sched/cls_flower.c:508 fl_change+0x101b/0x4ab0 net/sched/cls_flower.c:2341 tc_new_tfilter+0x97c/0x2290 net/sched/cls_api.c:2310 rtnetlink_rcv_msg+0x996/0xd50 net/core/rtnetlink.c:6165 netlink_rcv_skb+0x165/0x440 net/netlink/af_netlink.c:2574 netlink_unicast_kernel net/netlink/af_netlink.c:1339 [inline] netlink_unicast+0x547/0x7f0 net/netlink/af_netlink.c:1365 netlink_sendmsg+0x925/0xe30 net/netlink/af_netlink.c:1942 sock_sendmsg_nosec net/socket.c:722 [inline] sock_sendmsg+0xde/0x190 net/socket.c:745 ____sys_sendmsg+0x334/0x900 net/socket.c:2504 ___sys_sendmsg+0x110/0x1b0 net/socket.c:2558 __sys_sendmmsg+0x18f/0x460 net/socket.c:2644 __do_sys_sendmmsg net/socket.c:2673 [inline] __se_sys_sendmmsg net/socket.c:2670 [inline] __x64_sys_sendmmsg+0x9d/0x100 net/socket.c:2670 Fixes: 08a0063df3ae ("net/sched: flower: Move filter handle initialization earlier") Reported-by: syzbot+baabf3efa7c1e57d28b2@syzkaller.appspotmail.com Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Paul Blakey <paulb@nvidia.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net/sched: act_connmark: handle errno on tcf_idr_check_allocPedro Tammela2023-03-011-0/+3
| | | | | | | | | | | | Smatch reports that 'ci' can be used uninitialized. The current code ignores errno coming from tcf_idr_check_alloc, which will lead to the incorrect usage of 'ci'. Handle the errno as it should. Fixes: 288864effe33 ("net/sched: act_connmark: transition to percpu stats and rcu") Reviewed-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: Pedro Tammela <pctammela@mojatatu.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net/sched: act_api: move TCA_EXT_WARN_MSG to the correct hierarchyPedro Tammela2023-02-271-2/+2
| | | | | | | | | | | TCA_EXT_WARN_MSG is currently sitting outside of the expected hierarchy for the tc actions code. It should sit within TCA_ACT_TAB. Fixes: 0349b8779cc9 ("sched: add new attr TCA_EXT_WARN_MSG to report tc extact message") Reviewed-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: Pedro Tammela <pctammela@mojatatu.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net/sched: act_sample: fix action bind logicPedro Tammela2023-02-261-2/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The TC architecture allows filters and actions to be created independently. In filters the user can reference action objects using: tc action add action sample ... index 1 tc filter add ... action pedit index 1 In the current code for act_sample this is broken as it checks netlink attributes for create/update before actually checking if we are binding to an existing action. tdc results: 1..29 ok 1 9784 - Add valid sample action with mandatory arguments ok 2 5c91 - Add valid sample action with mandatory arguments and continue control action ok 3 334b - Add valid sample action with mandatory arguments and drop control action ok 4 da69 - Add valid sample action with mandatory arguments and reclassify control action ok 5 13ce - Add valid sample action with mandatory arguments and pipe control action ok 6 1886 - Add valid sample action with mandatory arguments and jump control action ok 7 7571 - Add sample action with invalid rate ok 8 b6d4 - Add sample action with mandatory arguments and invalid control action ok 9 a874 - Add invalid sample action without mandatory arguments ok 10 ac01 - Add invalid sample action without mandatory argument rate ok 11 4203 - Add invalid sample action without mandatory argument group ok 12 14a7 - Add invalid sample action without mandatory argument group ok 13 8f2e - Add valid sample action with trunc argument ok 14 45f8 - Add sample action with maximum rate argument ok 15 ad0c - Add sample action with maximum trunc argument ok 16 83a9 - Add sample action with maximum group argument ok 17 ed27 - Add sample action with invalid rate argument ok 18 2eae - Add sample action with invalid group argument ok 19 6ff3 - Add sample action with invalid trunc size ok 20 2b2a - Add sample action with invalid index ok 21 dee2 - Add sample action with maximum allowed index ok 22 560e - Add sample action with cookie ok 23 704a - Replace existing sample action with new rate argument ok 24 60eb - Replace existing sample action with new group argument ok 25 2cce - Replace existing sample action with new trunc argument ok 26 59d1 - Replace existing sample action with new control argument ok 27 0a6e - Replace sample action with invalid goto chain control ok 28 3872 - Delete sample action with valid index ok 29 a394 - Delete sample action with invalid index Fixes: 5c5670fae430 ("net/sched: Introduce sample tc action") Reviewed-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: Pedro Tammela <pctammela@mojatatu.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net/sched: act_mpls: fix action bind logicPedro Tammela2023-02-261-29/+37
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The TC architecture allows filters and actions to be created independently. In filters the user can reference action objects using: tc action add action mpls ... index 1 tc filter add ... action mpls index 1 In the current code for act_mpls this is broken as it checks netlink attributes for create/update before actually checking if we are binding to an existing action. tdc results: 1..53 ok 1 a933 - Add MPLS dec_ttl action with pipe opcode ok 2 08d1 - Add mpls dec_ttl action with pass opcode ok 3 d786 - Add mpls dec_ttl action with drop opcode ok 4 f334 - Add mpls dec_ttl action with reclassify opcode ok 5 29bd - Add mpls dec_ttl action with continue opcode ok 6 48df - Add mpls dec_ttl action with jump opcode ok 7 62eb - Add mpls dec_ttl action with trap opcode ok 8 09d2 - Add mpls dec_ttl action with opcode and cookie ok 9 c170 - Add mpls dec_ttl action with opcode and cookie of max length ok 10 9118 - Add mpls dec_ttl action with invalid opcode ok 11 6ce1 - Add mpls dec_ttl action with label (invalid) ok 12 352f - Add mpls dec_ttl action with tc (invalid) ok 13 fa1c - Add mpls dec_ttl action with ttl (invalid) ok 14 6b79 - Add mpls dec_ttl action with bos (invalid) ok 15 d4c4 - Add mpls pop action with ip proto ok 16 91fb - Add mpls pop action with ip proto and cookie ok 17 92fe - Add mpls pop action with mpls proto ok 18 7e23 - Add mpls pop action with no protocol (invalid) ok 19 6182 - Add mpls pop action with label (invalid) ok 20 6475 - Add mpls pop action with tc (invalid) ok 21 067b - Add mpls pop action with ttl (invalid) ok 22 7316 - Add mpls pop action with bos (invalid) ok 23 38cc - Add mpls push action with label ok 24 c281 - Add mpls push action with mpls_mc protocol ok 25 5db4 - Add mpls push action with label, tc and ttl ok 26 7c34 - Add mpls push action with label, tc ttl and cookie of max length ok 27 16eb - Add mpls push action with label and bos ok 28 d69d - Add mpls push action with no label (invalid) ok 29 e8e4 - Add mpls push action with ipv4 protocol (invalid) ok 30 ecd0 - Add mpls push action with out of range label (invalid) ok 31 d303 - Add mpls push action with out of range tc (invalid) ok 32 fd6e - Add mpls push action with ttl of 0 (invalid) ok 33 19e9 - Add mpls mod action with mpls label ok 34 1fde - Add mpls mod action with max mpls label ok 35 0c50 - Add mpls mod action with mpls label exceeding max (invalid) ok 36 10b6 - Add mpls mod action with mpls label of MPLS_LABEL_IMPLNULL (invalid) ok 37 57c9 - Add mpls mod action with mpls min tc ok 38 6872 - Add mpls mod action with mpls max tc ok 39 a70a - Add mpls mod action with mpls tc exceeding max (invalid) ok 40 6ed5 - Add mpls mod action with mpls ttl ok 41 77c1 - Add mpls mod action with mpls ttl and cookie ok 42 b80f - Add mpls mod action with mpls max ttl ok 43 8864 - Add mpls mod action with mpls min ttl ok 44 6c06 - Add mpls mod action with mpls ttl of 0 (invalid) ok 45 b5d8 - Add mpls mod action with mpls ttl exceeding max (invalid) ok 46 451f - Add mpls mod action with mpls max bos ok 47 a1ed - Add mpls mod action with mpls min bos ok 48 3dcf - Add mpls mod action with mpls bos exceeding max (invalid) ok 49 db7c - Add mpls mod action with protocol (invalid) ok 50 b070 - Replace existing mpls push action with new ID ok 51 95a9 - Replace existing mpls push action with new label, tc, ttl and cookie ok 52 6cce - Delete mpls pop action ok 53 d138 - Flush mpls actions Fixes: 2a2ea50870ba ("net: sched: add mpls manipulation actions to TC") Reviewed-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: Pedro Tammela <pctammela@mojatatu.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net/sched: act_pedit: fix action bind logicPedro Tammela2023-02-261-27/+31
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The TC architecture allows filters and actions to be created independently. In filters the user can reference action objects using: tc action add action pedit ... index 1 tc filter add ... action pedit index 1 In the current code for act_pedit this is broken as it checks netlink attributes for create/update before actually checking if we are binding to an existing action. tdc results: 1..69 ok 1 319a - Add pedit action that mangles IP TTL ok 2 7e67 - Replace pedit action with invalid goto chain ok 3 377e - Add pedit action with RAW_OP offset u32 ok 4 a0ca - Add pedit action with RAW_OP offset u32 (INVALID) ok 5 dd8a - Add pedit action with RAW_OP offset u16 u16 ok 6 53db - Add pedit action with RAW_OP offset u16 (INVALID) ok 7 5c7e - Add pedit action with RAW_OP offset u8 add value ok 8 2893 - Add pedit action with RAW_OP offset u8 quad ok 9 3a07 - Add pedit action with RAW_OP offset u8-u16-u8 ok 10 ab0f - Add pedit action with RAW_OP offset u16-u8-u8 ok 11 9d12 - Add pedit action with RAW_OP offset u32 set u16 clear u8 invert ok 12 ebfa - Add pedit action with RAW_OP offset overflow u32 (INVALID) ok 13 f512 - Add pedit action with RAW_OP offset u16 at offmask shift set ok 14 c2cb - Add pedit action with RAW_OP offset u32 retain value ok 15 1762 - Add pedit action with RAW_OP offset u8 clear value ok 16 bcee - Add pedit action with RAW_OP offset u8 retain value ok 17 e89f - Add pedit action with RAW_OP offset u16 retain value ok 18 c282 - Add pedit action with RAW_OP offset u32 clear value ok 19 c422 - Add pedit action with RAW_OP offset u16 invert value ok 20 d3d3 - Add pedit action with RAW_OP offset u32 invert value ok 21 57e5 - Add pedit action with RAW_OP offset u8 preserve value ok 22 99e0 - Add pedit action with RAW_OP offset u16 preserve value ok 23 1892 - Add pedit action with RAW_OP offset u32 preserve value ok 24 4b60 - Add pedit action with RAW_OP negative offset u16/u32 set value ok 25 a5a7 - Add pedit action with LAYERED_OP eth set src ok 26 86d4 - Add pedit action with LAYERED_OP eth set src & dst ok 27 f8a9 - Add pedit action with LAYERED_OP eth set dst ok 28 c715 - Add pedit action with LAYERED_OP eth set src (INVALID) ok 29 8131 - Add pedit action with LAYERED_OP eth set dst (INVALID) ok 30 ba22 - Add pedit action with LAYERED_OP eth type set/clear sequence ok 31 dec4 - Add pedit action with LAYERED_OP eth set type (INVALID) ok 32 ab06 - Add pedit action with LAYERED_OP eth add type ok 33 918d - Add pedit action with LAYERED_OP eth invert src ok 34 a8d4 - Add pedit action with LAYERED_OP eth invert dst ok 35 ee13 - Add pedit action with LAYERED_OP eth invert type ok 36 7588 - Add pedit action with LAYERED_OP ip set src ok 37 0fa7 - Add pedit action with LAYERED_OP ip set dst ok 38 5810 - Add pedit action with LAYERED_OP ip set src & dst ok 39 1092 - Add pedit action with LAYERED_OP ip set ihl & dsfield ok 40 02d8 - Add pedit action with LAYERED_OP ip set ttl & protocol ok 41 3e2d - Add pedit action with LAYERED_OP ip set ttl (INVALID) ok 42 31ae - Add pedit action with LAYERED_OP ip ttl clear/set ok 43 486f - Add pedit action with LAYERED_OP ip set duplicate fields ok 44 e790 - Add pedit action with LAYERED_OP ip set ce, df, mf, firstfrag, nofrag fields ok 45 cc8a - Add pedit action with LAYERED_OP ip set tos ok 46 7a17 - Add pedit action with LAYERED_OP ip set precedence ok 47 c3b6 - Add pedit action with LAYERED_OP ip add tos ok 48 43d3 - Add pedit action with LAYERED_OP ip add precedence ok 49 438e - Add pedit action with LAYERED_OP ip clear tos ok 50 6b1b - Add pedit action with LAYERED_OP ip clear precedence ok 51 824a - Add pedit action with LAYERED_OP ip invert tos ok 52 106f - Add pedit action with LAYERED_OP ip invert precedence ok 53 6829 - Add pedit action with LAYERED_OP beyond ip set dport & sport ok 54 afd8 - Add pedit action with LAYERED_OP beyond ip set icmp_type & icmp_code ok 55 3143 - Add pedit action with LAYERED_OP beyond ip set dport (INVALID) ok 56 815c - Add pedit action with LAYERED_OP ip6 set src ok 57 4dae - Add pedit action with LAYERED_OP ip6 set dst ok 58 fc1f - Add pedit action with LAYERED_OP ip6 set src & dst ok 59 6d34 - Add pedit action with LAYERED_OP ip6 dst retain value (INVALID) ok 60 94bb - Add pedit action with LAYERED_OP ip6 traffic_class ok 61 6f5e - Add pedit action with LAYERED_OP ip6 flow_lbl ok 62 6795 - Add pedit action with LAYERED_OP ip6 set payload_len, nexthdr, hoplimit ok 63 1442 - Add pedit action with LAYERED_OP tcp set dport & sport ok 64 b7ac - Add pedit action with LAYERED_OP tcp sport set (INVALID) ok 65 cfcc - Add pedit action with LAYERED_OP tcp flags set ok 66 3bc4 - Add pedit action with LAYERED_OP tcp set dport, sport & flags fields ok 67 f1c8 - Add pedit action with LAYERED_OP udp set dport & sport ok 68 d784 - Add pedit action with mixed RAW/LAYERED_OP #1 ok 69 70ca - Add pedit action with mixed RAW/LAYERED_OP #2 Fixes: 71d0ed7079df ("net/act_pedit: Support using offset relative to the conventional network headers") Fixes: f67169fef8db ("net/sched: act_pedit: fix WARN() in the traffic path") Reviewed-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: Pedro Tammela <pctammela@mojatatu.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net/sched: cls_api: Move call to tcf_exts_miss_cookie_base_destroy()Nathan Chancellor2023-02-261-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | When CONFIG_NET_CLS_ACT is disabled: ../net/sched/cls_api.c:141:13: warning: 'tcf_exts_miss_cookie_base_destroy' defined but not used [-Wunused-function] 141 | static void tcf_exts_miss_cookie_base_destroy(struct tcf_exts *exts) | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Due to the way the code is structured, it is possible for a definition of tcf_exts_miss_cookie_base_destroy() to be present without actually being used. Its single callsite is in an '#ifdef CONFIG_NET_CLS_ACT' block but a definition will always be present in the file. The version of tcf_exts_miss_cookie_base_destroy() that actually does something depends on CONFIG_NET_TC_SKB_EXT, so the stub function is used in both CONFIG_NET_CLS_ACT=n and CONFIG_NET_CLS_ACT=y + CONFIG_NET_TC_SKB_EXT=n configurations. Move the call to tcf_exts_miss_cookie_base_destroy() in tcf_exts_destroy() out of the '#ifdef CONFIG_NET_CLS_ACT', so that it always appears used to the compiler, while not changing any behavior with any of the various configuration combinations. Fixes: 80cd22c35c90 ("net/sched: cls_api: Support hardware miss to tc action") Signed-off-by: Nathan Chancellor <nathan@kernel.org> Reviewed-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net/sched: flower: Support hardware miss to tc actionPaul Blakey2023-02-201-1/+12
| | | | | | | | | | | | | To support hardware miss to tc action in actions on the flower classifier, implement the required getting of filter actions, and setup filter exts (actions) miss by giving it the filter's handle and actions. Signed-off-by: Paul Blakey <paulb@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Reviewed-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
* net/sched: flower: Move filter handle initialization earlierPaul Blakey2023-02-201-27/+35
| | | | | | | | | | | | | | To support miss to action during hardware offload the filter's handle is needed when setting up the actions (tcf_exts_init()), and before offloading. Move filter handle initialization earlier. Signed-off-by: Paul Blakey <paulb@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Reviewed-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
* net/sched: cls_api: Support hardware miss to tc actionPaul Blakey2023-02-202-11/+206
| | | | | | | | | | | | | | | | | | | | | | | For drivers to support partial offload of a filter's action list, add support for action miss to specify an action instance to continue from in sw. CT action in particular can't be fully offloaded, as new connections need to be handled in software. This imposes other limitations on the actions that can be offloaded together with the CT action, such as packet modifications. Assign each action on a filter's action list a unique miss_cookie which drivers can then use to fill action_miss part of the tc skb extension. On getting back this miss_cookie, find the action instance with relevant cookie and continue classifying from there. Signed-off-by: Paul Blakey <paulb@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Reviewed-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
* net/sched: Rename user cookie and act cookiePaul Blakey2023-02-202-27/+27
| | | | | | | | | | | | | | struct tc_action->act_cookie is a user defined cookie, and the related struct flow_action_entry->act_cookie is used as an handle similar to struct flow_cls_offload->cookie. Rename tc_action->act_cookie to user_cookie, and flow_action_entry->act_cookie to cookie so their names would better fit their usage. Signed-off-by: Paul Blakey <paulb@nvidia.com> Reviewed-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
* net/sched: taprio: dynamic max_sdu larger than the max_mtu is unlimitedVladimir Oltean2023-02-201-0/+2
| | | | | | | | | | | | | | | It makes no sense to keep randomly large max_sdu values, especially if larger than the device's max_mtu. These are visible in "tc qdisc show". Such a max_sdu is practically unlimited and will cause no packets for that traffic class to be dropped on enqueue. Just set max_sdu_dynamic to U32_MAX, which in the logic below causes taprio to save a max_frm_len of U32_MAX and a max_sdu presented to user space of 0 (unlimited). Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Kurt Kanzenbach <kurt@linutronix.de> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
* net/sched: taprio: don't allow dynamic max_sdu to go negative after stab ↵Vladimir Oltean2023-02-201-1/+7
| | | | | | | | | | | | | | | | | | | | | | | | adjustment The overhead specified in the size table comes from the user. With small time intervals (or gates always closed), the overhead can be larger than the max interval for that traffic class, and their difference is negative. What we want to happen is for max_sdu_dynamic to have the smallest non-zero value possible (1) which means that all packets on that traffic class are dropped on enqueue. However, since max_sdu_dynamic is u32, a negative is represented as a large value and oversized dropping never happens. Use max_t with int to force a truncation of max_frm_len to no smaller than dev->hard_header_len + 1, which in turn makes max_sdu_dynamic no smaller than 1. Fixes: fed87cc6718a ("net/sched: taprio: automatically calculate queueMaxSDU based on TC gate durations") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Kurt Kanzenbach <kurt@linutronix.de> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
* net/sched: taprio: fix calculation of maximum gate durationsVladimir Oltean2023-02-201-17/+17
| | | | | | | | | | | | | | | | | | | | | | | taprio_calculate_gate_durations() depends on netdev_get_num_tc() and this returns 0. So it calculates the maximum gate durations for no traffic class. I had tested the blamed commit only with another patch in my tree, one which in the end I decided isn't valuable enough to submit ("net/sched: taprio: mask off bits in gate mask that exceed number of TCs"). The problem is that having this patch threw off my testing. By moving the netdev_set_num_tc() call earlier, we implicitly gave to taprio_calculate_gate_durations() the information it needed. Extract only the portion from the unsubmitted change which applies the mqprio configuration to the netdev earlier. Link: https://patchwork.kernel.org/project/netdevbpf/patch/20230130173145.475943-15-vladimir.oltean@nxp.com/ Fixes: a306a90c8ffe ("net/sched: taprio: calculate tc gate durations") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Kurt Kanzenbach <kurt@linutronix.de> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
* Merge ra.kernel.org:/pub/scm/linux/kernel/git/netdev/netDavid S. Miller2023-02-171-3/+3
|\ | | | | | | | | | | Some of the devlink bits were tricky, but I think I got it right. Signed-off-by: David S. Miller <davem@davemloft.net>
| * net/sched: tcindex: search key must be 16 bitsPedro Tammela2023-02-151-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Syzkaller found an issue where a handle greater than 16 bits would trigger a null-ptr-deref in the imperfect hash area update. general protection fault, probably for non-canonical address 0xdffffc0000000015: 0000 [#1] PREEMPT SMP KASAN KASAN: null-ptr-deref in range [0x00000000000000a8-0x00000000000000af] CPU: 0 PID: 5070 Comm: syz-executor456 Not tainted 6.2.0-rc7-syzkaller-00112-gc68f345b7c42 #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/21/2023 RIP: 0010:tcindex_set_parms+0x1a6a/0x2990 net/sched/cls_tcindex.c:509 Code: 01 e9 e9 fe ff ff 4c 8b bd 28 fe ff ff e8 0e 57 7d f9 48 8d bb a8 00 00 00 48 b8 00 00 00 00 00 fc ff df 48 89 fa 48 c1 ea 03 <80> 3c 02 00 0f 85 94 0c 00 00 48 8b 85 f8 fd ff ff 48 8b 9b a8 00 RSP: 0018:ffffc90003d3ef88 EFLAGS: 00010202 RAX: dffffc0000000000 RBX: 0000000000000000 RCX: 0000000000000000 RDX: 0000000000000015 RSI: ffffffff8803a102 RDI: 00000000000000a8 RBP: ffffc90003d3f1d8 R08: 0000000000000001 R09: 0000000000000000 R10: 0000000000000001 R11: 0000000000000000 R12: ffff88801e2b10a8 R13: dffffc0000000000 R14: 0000000000030000 R15: ffff888017b3be00 FS: 00005555569af300(0000) GS:ffff8880b9800000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 000056041c6d2000 CR3: 000000002bfca000 CR4: 00000000003506f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <TASK> tcindex_change+0x1ea/0x320 net/sched/cls_tcindex.c:572 tc_new_tfilter+0x96e/0x2220 net/sched/cls_api.c:2155 rtnetlink_rcv_msg+0x959/0xca0 net/core/rtnetlink.c:6132 netlink_rcv_skb+0x165/0x440 net/netlink/af_netlink.c:2574 netlink_unicast_kernel net/netlink/af_netlink.c:1339 [inline] netlink_unicast+0x547/0x7f0 net/netlink/af_netlink.c:1365 netlink_sendmsg+0x91b/0xe10 net/netlink/af_netlink.c:1942 sock_sendmsg_nosec net/socket.c:714 [inline] sock_sendmsg+0xd3/0x120 net/socket.c:734 ____sys_sendmsg+0x334/0x8c0 net/socket.c:2476 ___sys_sendmsg+0x110/0x1b0 net/socket.c:2530 __sys_sendmmsg+0x18f/0x460 net/socket.c:2616 __do_sys_sendmmsg net/socket.c:2645 [inline] __se_sys_sendmmsg net/socket.c:2642 [inline] __x64_sys_sendmmsg+0x9d/0x100 net/socket.c:2642 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x39/0xb0 arch/x86/entry/common.c:80 Fixes: ee059170b1f7 ("net/sched: tcindex: update imperfect hash filters respecting rcu") Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: Pedro Tammela <pctammela@mojatatu.com> Reported-by: syzbot <syzkaller@googlegroups.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * net/sched: act_ctinfo: use percpu statsPedro Tammela2023-02-131-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The tc action act_ctinfo was using shared stats, fix it to use percpu stats since bstats_update() must be called with locks or with a percpu pointer argument. tdc results: 1..12 ok 1 c826 - Add ctinfo action with default setting ok 2 0286 - Add ctinfo action with dscp ok 3 4938 - Add ctinfo action with valid cpmark and zone ok 4 7593 - Add ctinfo action with drop control ok 5 2961 - Replace ctinfo action zone and action control ok 6 e567 - Delete ctinfo action with valid index ok 7 6a91 - Delete ctinfo action with invalid index ok 8 5232 - List ctinfo actions ok 9 7702 - Flush ctinfo actions ok 10 3201 - Add ctinfo action with duplicate index ok 11 8295 - Add ctinfo action with invalid index ok 12 3964 - Replace ctinfo action with invalid goto_chain control Fixes: 24ec483cec98 ("net: sched: Introduce act_ctinfo action") Reviewed-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: Pedro Tammela <pctammela@mojatatu.com> Reviewed-by: Larysa Zaremba <larysa.zaremba@intel.com> Link: https://lore.kernel.org/r/20230210200824.444856-1-pctammela@mojatatu.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
| * net/sched: tcindex: update imperfect hash filters respecting rcuPedro Tammela2023-02-101-4/+30
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The imperfect hash area can be updated while packets are traversing, which will cause a use-after-free when 'tcf_exts_exec()' is called with the destroyed tcf_ext. CPU 0: CPU 1: tcindex_set_parms tcindex_classify tcindex_lookup tcindex_lookup tcf_exts_change tcf_exts_exec [UAF] Stop operating on the shared area directly, by using a local copy, and update the filter with 'rcu_replace_pointer()'. Delete the old filter version only after a rcu grace period elapsed. Fixes: 9b0d4446b569 ("net: sched: avoid atomic swap in tcf_exts_change") Reported-by: valis <sec@valis.email> Suggested-by: valis <sec@valis.email> Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: Pedro Tammela <pctammela@mojatatu.com> Link: https://lore.kernel.org/r/20230209143739.279867-1-pctammela@mojatatu.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>