summaryrefslogtreecommitdiffstats
path: root/net/sched/act_ct.c
Commit message (Collapse)AuthorAgeFilesLines
* sched: act_ct: avoid -Wflex-array-member-not-at-end warningGustavo A. R. Silva2024-08-121-3/+1
| | | | | | | | | | | | | | | -Wflex-array-member-not-at-end was introduced in GCC-14, and we are getting ready to enable it, globally. Remove unnecessary flex-array member `pad[]` and refactor the related code a bit. Fix the following warning: net/sched/act_ct.c:57:29: warning: structure containing a flexible array member is not at the end of another structure [-Wflex-array-member-not-at-end] Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org> Link: https://patch.msgid.link/ZrY0JMVsImbDbx6r@cute Signed-off-by: Jakub Kicinski <kuba@kernel.org>
* sched: act_ct: take care of padding in struct zones_ht_keyEric Dumazet2024-07-261-1/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Blamed commit increased lookup key size from 2 bytes to 16 bytes, because zones_ht_key got a struct net pointer. Make sure rhashtable_lookup() is not using the padding bytes which are not initialized. BUG: KMSAN: uninit-value in rht_ptr_rcu include/linux/rhashtable.h:376 [inline] BUG: KMSAN: uninit-value in __rhashtable_lookup include/linux/rhashtable.h:607 [inline] BUG: KMSAN: uninit-value in rhashtable_lookup include/linux/rhashtable.h:646 [inline] BUG: KMSAN: uninit-value in rhashtable_lookup_fast include/linux/rhashtable.h:672 [inline] BUG: KMSAN: uninit-value in tcf_ct_flow_table_get+0x611/0x2260 net/sched/act_ct.c:329 rht_ptr_rcu include/linux/rhashtable.h:376 [inline] __rhashtable_lookup include/linux/rhashtable.h:607 [inline] rhashtable_lookup include/linux/rhashtable.h:646 [inline] rhashtable_lookup_fast include/linux/rhashtable.h:672 [inline] tcf_ct_flow_table_get+0x611/0x2260 net/sched/act_ct.c:329 tcf_ct_init+0xa67/0x2890 net/sched/act_ct.c:1408 tcf_action_init_1+0x6cc/0xb30 net/sched/act_api.c:1425 tcf_action_init+0x458/0xf00 net/sched/act_api.c:1488 tcf_action_add net/sched/act_api.c:2061 [inline] tc_ctl_action+0x4be/0x19d0 net/sched/act_api.c:2118 rtnetlink_rcv_msg+0x12fc/0x1410 net/core/rtnetlink.c:6647 netlink_rcv_skb+0x375/0x650 net/netlink/af_netlink.c:2550 rtnetlink_rcv+0x34/0x40 net/core/rtnetlink.c:6665 netlink_unicast_kernel net/netlink/af_netlink.c:1331 [inline] netlink_unicast+0xf52/0x1260 net/netlink/af_netlink.c:1357 netlink_sendmsg+0x10da/0x11e0 net/netlink/af_netlink.c:1901 sock_sendmsg_nosec net/socket.c:730 [inline] __sock_sendmsg+0x30f/0x380 net/socket.c:745 ____sys_sendmsg+0x877/0xb60 net/socket.c:2597 ___sys_sendmsg+0x28d/0x3c0 net/socket.c:2651 __sys_sendmsg net/socket.c:2680 [inline] __do_sys_sendmsg net/socket.c:2689 [inline] __se_sys_sendmsg net/socket.c:2687 [inline] __x64_sys_sendmsg+0x307/0x4a0 net/socket.c:2687 x64_sys_call+0x2dd6/0x3c10 arch/x86/include/generated/asm/syscalls_64.h:47 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0xcd/0x1e0 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x77/0x7f Local variable key created at: tcf_ct_flow_table_get+0x4a/0x2260 net/sched/act_ct.c:324 tcf_ct_init+0xa67/0x2890 net/sched/act_ct.c:1408 Fixes: 88c67aeb1407 ("sched: act_ct: add netns into the key of tcf_ct_flow_table") Reported-by: syzbot+1b5e4e187cc586d05ea0@syzkaller.appspotmail.com Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Xin Long <lucien.xin@gmail.com> Reviewed-by: Simon Horman <horms@kernel.org> Reviewed-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski2024-07-111-0/+8
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | Cross-merge networking fixes after downstream PR. Conflicts: net/sched/act_ct.c 26488172b029 ("net/sched: Fix UAF when resolving a clash") 3abbd7ed8b76 ("act_ct: prepare for stolen verdict coming from conntrack and nat engine") No adjacent changes. Signed-off-by: Jakub Kicinski <kuba@kernel.org>
| * net/sched: Fix UAF when resolving a clashChengen Du2024-07-111-0/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | KASAN reports the following UAF: BUG: KASAN: slab-use-after-free in tcf_ct_flow_table_process_conn+0x12b/0x380 [act_ct] Read of size 1 at addr ffff888c07603600 by task handler130/6469 Call Trace: <IRQ> dump_stack_lvl+0x48/0x70 print_address_description.constprop.0+0x33/0x3d0 print_report+0xc0/0x2b0 kasan_report+0xd0/0x120 __asan_load1+0x6c/0x80 tcf_ct_flow_table_process_conn+0x12b/0x380 [act_ct] tcf_ct_act+0x886/0x1350 [act_ct] tcf_action_exec+0xf8/0x1f0 fl_classify+0x355/0x360 [cls_flower] __tcf_classify+0x1fd/0x330 tcf_classify+0x21c/0x3c0 sch_handle_ingress.constprop.0+0x2c5/0x500 __netif_receive_skb_core.constprop.0+0xb25/0x1510 __netif_receive_skb_list_core+0x220/0x4c0 netif_receive_skb_list_internal+0x446/0x620 napi_complete_done+0x157/0x3d0 gro_cell_poll+0xcf/0x100 __napi_poll+0x65/0x310 net_rx_action+0x30c/0x5c0 __do_softirq+0x14f/0x491 __irq_exit_rcu+0x82/0xc0 irq_exit_rcu+0xe/0x20 common_interrupt+0xa1/0xb0 </IRQ> <TASK> asm_common_interrupt+0x27/0x40 Allocated by task 6469: kasan_save_stack+0x38/0x70 kasan_set_track+0x25/0x40 kasan_save_alloc_info+0x1e/0x40 __kasan_krealloc+0x133/0x190 krealloc+0xaa/0x130 nf_ct_ext_add+0xed/0x230 [nf_conntrack] tcf_ct_act+0x1095/0x1350 [act_ct] tcf_action_exec+0xf8/0x1f0 fl_classify+0x355/0x360 [cls_flower] __tcf_classify+0x1fd/0x330 tcf_classify+0x21c/0x3c0 sch_handle_ingress.constprop.0+0x2c5/0x500 __netif_receive_skb_core.constprop.0+0xb25/0x1510 __netif_receive_skb_list_core+0x220/0x4c0 netif_receive_skb_list_internal+0x446/0x620 napi_complete_done+0x157/0x3d0 gro_cell_poll+0xcf/0x100 __napi_poll+0x65/0x310 net_rx_action+0x30c/0x5c0 __do_softirq+0x14f/0x491 Freed by task 6469: kasan_save_stack+0x38/0x70 kasan_set_track+0x25/0x40 kasan_save_free_info+0x2b/0x60 ____kasan_slab_free+0x180/0x1f0 __kasan_slab_free+0x12/0x30 slab_free_freelist_hook+0xd2/0x1a0 __kmem_cache_free+0x1a2/0x2f0 kfree+0x78/0x120 nf_conntrack_free+0x74/0x130 [nf_conntrack] nf_ct_destroy+0xb2/0x140 [nf_conntrack] __nf_ct_resolve_clash+0x529/0x5d0 [nf_conntrack] nf_ct_resolve_clash+0xf6/0x490 [nf_conntrack] __nf_conntrack_confirm+0x2c6/0x770 [nf_conntrack] tcf_ct_act+0x12ad/0x1350 [act_ct] tcf_action_exec+0xf8/0x1f0 fl_classify+0x355/0x360 [cls_flower] __tcf_classify+0x1fd/0x330 tcf_classify+0x21c/0x3c0 sch_handle_ingress.constprop.0+0x2c5/0x500 __netif_receive_skb_core.constprop.0+0xb25/0x1510 __netif_receive_skb_list_core+0x220/0x4c0 netif_receive_skb_list_internal+0x446/0x620 napi_complete_done+0x157/0x3d0 gro_cell_poll+0xcf/0x100 __napi_poll+0x65/0x310 net_rx_action+0x30c/0x5c0 __do_softirq+0x14f/0x491 The ct may be dropped if a clash has been resolved but is still passed to the tcf_ct_flow_table_process_conn function for further usage. This issue can be fixed by retrieving ct from skb again after confirming conntrack. Fixes: 0cc254e5aa37 ("net/sched: act_ct: Offload connections with commit action") Co-developed-by: Gerald Yang <gerald.yang@canonical.com> Signed-off-by: Gerald Yang <gerald.yang@canonical.com> Signed-off-by: Chengen Du <chengen.du@canonical.com> Link: https://patch.msgid.link/20240710053747.13223-1-chengen.du@canonical.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
* | act_ct: prepare for stolen verdict coming from conntrack and nat engineFlorian Westphal2024-07-081-6/+25
|/ | | | | | | | | | | | | | | | At this time, conntrack either returns NF_ACCEPT or NF_DROP. To improve debuging it would be nice to be able to replace NF_DROP verdict with NF_DROP_REASON() helper, This helper releases the skb instantly (so drop_monitor can pinpoint exact location) and returns NF_STOLEN. Prepare call sites to deal with this before introducing such changes in conntrack and nat core. Signed-off-by: Florian Westphal <fw@strlen.de> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* sched: act_ct: add netns into the key of tcf_ct_flow_tableXin Long2024-06-181-5/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | zones_ht is a global hashtable for flow_table with zone as key. However, it does not consider netns when getting a flow_table from zones_ht in tcf_ct_init(), and it means an act_ct action in netns A may get a flow_table that belongs to netns B if it has the same zone value. In Shuang's test with the TOPO: tcf2_c <---> tcf2_sw1 <---> tcf2_sw2 <---> tcf2_s tcf2_sw1 and tcf2_sw2 saw the same flow and used the same flow table, which caused their ct entries entering unexpected states and the TCP connection not able to end normally. This patch fixes the issue simply by adding netns into the key of tcf_ct_flow_table so that an act_ct action gets a flow_table that belongs to its own netns in tcf_ct_init(). Note that for easy coding we don't use tcf_ct_flow_table.nf_ft.net, as the ct_ft is initialized after inserting it to the hashtable in tcf_ct_flow_table_get() and also it requires to implement several functions in rhashtable_params including hashfn, obj_hashfn and obj_cmpfn. Fixes: 64ff70b80fd4 ("net/sched: act_ct: Offload established connections to flow table") Reported-by: Shuang Li <shuali@redhat.com> Signed-off-by: Xin Long <lucien.xin@gmail.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/1db5b6cc6902c5fc6f8c6cbd85494a2008087be5.1718488050.git.lucien.xin@gmail.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
* net/sched: Add module aliases for cls_,sch_,act_ modulesMichal Koutný2024-02-021-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | No functional change intended, aliases will be used in followup commits. Note for backporters: you may need to add aliases also for modules that are already removed in mainline kernel but still in your version. Patches were generated with the help of Coccinelle scripts like: cat >scripts/coccinelle/misc/tcf_alias.cocci <<EOD virtual patch virtual report @ haskernel @ @@ @ tcf_has_kind depends on report && haskernel @ identifier ops; constant K; @@ static struct tcf_proto_ops ops = { .kind = K, ... }; +char module_alias = K; EOD /usr/bin/spatch -D report --cocci-file scripts/coccinelle/misc/tcf_alias.cocci \ --dir . \ -I ./arch/x86/include -I ./arch/x86/include/generated -I ./include \ -I ./arch/x86/include/uapi -I ./arch/x86/include/generated/uapi \ -I ./include/uapi -I ./include/generated/uapi \ --include ./include/linux/compiler-version.h --include ./include/linux/kconfig.h \ --jobs 8 --chunksize 1 2>/dev/null | \ sed 's/char module_alias = "\([^"]*\)";/MODULE_ALIAS_NET_CLS("\1");/' And analogously for: static struct tc_action_ops ops = { .kind = K, static struct Qdisc_ops ops = { .id = K, (Someone familiar would be able to fit those into one .cocci file without sed post processing.) Signed-off-by: Michal Koutný <mkoutny@suse.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Link: https://lore.kernel.org/r/20240201130943.19536-3-mkoutny@suse.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
* Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netPaolo Abeni2024-01-091-6/+6
|\ | | | | | | | | | | | | | | Merge in late fixes to prepare for the 6.8 net-next PR No conflicts. Signed-off-by: Paolo Abeni <pabeni@redhat.com>
| * net/sched: act_ct: fix skb leak and crash on ooo fragsTao Liu2024-01-051-6/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | act_ct adds skb->users before defragmentation. If frags arrive in order, the last frag's reference is reset in: inet_frag_reasm_prepare skb_morph which is not straightforward. However when frags arrive out of order, nobody unref the last frag, and all frags are leaked. The situation is even worse, as initiating packet capture can lead to a crash[0] when skb has been cloned and shared at the same time. Fix the issue by removing skb_get() before defragmentation. act_ct returns TC_ACT_CONSUMED when defrag failed or in progress. [0]: [ 843.804823] ------------[ cut here ]------------ [ 843.809659] kernel BUG at net/core/skbuff.c:2091! [ 843.814516] invalid opcode: 0000 [#1] PREEMPT SMP [ 843.819296] CPU: 7 PID: 0 Comm: swapper/7 Kdump: loaded Tainted: G S 6.7.0-rc3 #2 [ 843.824107] Hardware name: XFUSION 1288H V6/BC13MBSBD, BIOS 1.29 11/25/2022 [ 843.828953] RIP: 0010:pskb_expand_head+0x2ac/0x300 [ 843.833805] Code: 8b 70 28 48 85 f6 74 82 48 83 c6 08 bf 01 00 00 00 e8 38 bd ff ff 8b 83 c0 00 00 00 48 03 83 c8 00 00 00 e9 62 ff ff ff 0f 0b <0f> 0b e8 8d d0 ff ff e9 b3 fd ff ff 81 7c 24 14 40 01 00 00 4c 89 [ 843.843698] RSP: 0018:ffffc9000cce07c0 EFLAGS: 00010202 [ 843.848524] RAX: 0000000000000002 RBX: ffff88811a211d00 RCX: 0000000000000820 [ 843.853299] RDX: 0000000000000640 RSI: 0000000000000000 RDI: ffff88811a211d00 [ 843.857974] RBP: ffff888127d39518 R08: 00000000bee97314 R09: 0000000000000000 [ 843.862584] R10: 0000000000000000 R11: ffff8881109f0000 R12: 0000000000000880 [ 843.867147] R13: ffff888127d39580 R14: 0000000000000640 R15: ffff888170f7b900 [ 843.871680] FS: 0000000000000000(0000) GS:ffff889ffffc0000(0000) knlGS:0000000000000000 [ 843.876242] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 843.880778] CR2: 00007fa42affcfb8 CR3: 000000011433a002 CR4: 0000000000770ef0 [ 843.885336] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 843.889809] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 843.894229] PKRU: 55555554 [ 843.898539] Call Trace: [ 843.902772] <IRQ> [ 843.906922] ? __die_body+0x1e/0x60 [ 843.911032] ? die+0x3c/0x60 [ 843.915037] ? do_trap+0xe2/0x110 [ 843.918911] ? pskb_expand_head+0x2ac/0x300 [ 843.922687] ? do_error_trap+0x65/0x80 [ 843.926342] ? pskb_expand_head+0x2ac/0x300 [ 843.929905] ? exc_invalid_op+0x50/0x60 [ 843.933398] ? pskb_expand_head+0x2ac/0x300 [ 843.936835] ? asm_exc_invalid_op+0x1a/0x20 [ 843.940226] ? pskb_expand_head+0x2ac/0x300 [ 843.943580] inet_frag_reasm_prepare+0xd1/0x240 [ 843.946904] ip_defrag+0x5d4/0x870 [ 843.950132] nf_ct_handle_fragments+0xec/0x130 [nf_conntrack] [ 843.953334] tcf_ct_act+0x252/0xd90 [act_ct] [ 843.956473] ? tcf_mirred_act+0x516/0x5a0 [act_mirred] [ 843.959657] tcf_action_exec+0xa1/0x160 [ 843.962823] fl_classify+0x1db/0x1f0 [cls_flower] [ 843.966010] ? skb_clone+0x53/0xc0 [ 843.969173] tcf_classify+0x24d/0x420 [ 843.972333] tc_run+0x8f/0xf0 [ 843.975465] __netif_receive_skb_core+0x67a/0x1080 [ 843.978634] ? dev_gro_receive+0x249/0x730 [ 843.981759] __netif_receive_skb_list_core+0x12d/0x260 [ 843.984869] netif_receive_skb_list_internal+0x1cb/0x2f0 [ 843.987957] ? mlx5e_handle_rx_cqe_mpwrq_rep+0xfa/0x1a0 [mlx5_core] [ 843.991170] napi_complete_done+0x72/0x1a0 [ 843.994305] mlx5e_napi_poll+0x28c/0x6d0 [mlx5_core] [ 843.997501] __napi_poll+0x25/0x1b0 [ 844.000627] net_rx_action+0x256/0x330 [ 844.003705] __do_softirq+0xb3/0x29b [ 844.006718] irq_exit_rcu+0x9e/0xc0 [ 844.009672] common_interrupt+0x86/0xa0 [ 844.012537] </IRQ> [ 844.015285] <TASK> [ 844.017937] asm_common_interrupt+0x26/0x40 [ 844.020591] RIP: 0010:acpi_safe_halt+0x1b/0x20 [ 844.023247] Code: ff 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 65 48 8b 04 25 00 18 03 00 48 8b 00 a8 08 75 0c 66 90 0f 00 2d 81 d0 44 00 fb f4 <fa> c3 0f 1f 00 89 fa ec 48 8b 05 ee 88 ed 00 a9 00 00 00 80 75 11 [ 844.028900] RSP: 0018:ffffc90000533e70 EFLAGS: 00000246 [ 844.031725] RAX: 0000000000004000 RBX: 0000000000000001 RCX: 0000000000000000 [ 844.034553] RDX: ffff889ffffc0000 RSI: ffffffff828b7f20 RDI: ffff88a090f45c64 [ 844.037368] RBP: ffff88a0901a2800 R08: ffff88a090f45c00 R09: 00000000000317c0 [ 844.040155] R10: 00ec812281150475 R11: ffff889fffff0e04 R12: ffffffff828b7fa0 [ 844.042962] R13: ffffffff828b7f20 R14: 0000000000000001 R15: 0000000000000000 [ 844.045819] acpi_idle_enter+0x7b/0xc0 [ 844.048621] cpuidle_enter_state+0x7f/0x430 [ 844.051451] cpuidle_enter+0x2d/0x40 [ 844.054279] do_idle+0x1d4/0x240 [ 844.057096] cpu_startup_entry+0x2a/0x30 [ 844.059934] start_secondary+0x104/0x130 [ 844.062787] secondary_startup_64_no_verify+0x16b/0x16b [ 844.065674] </TASK> Fixes: b57dc7c13ea9 ("net/sched: Introduce action ct") Signed-off-by: Tao Liu <taoliu828@163.com> Link: https://lore.kernel.org/r/20231228081457.936732-1-taoliu828@163.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
* | net/sched: introduce ACT_P_BOUND return codePedro Tammela2024-01-031-1/+1
|/ | | | | | | | | | | | | | | Bound actions always return '0' and as of today we rely on '0' being returned in order to properly skip bound actions in tcf_idr_insert_many. In order to further improve maintainability, introduce the ACT_P_BOUND return code. Actions are updated to return 'ACT_P_BOUND' instead of plain '0'. tcf_idr_insert_many is then updated to check for 'ACT_P_BOUND'. Signed-off-by: Pedro Tammela <pctammela@mojatatu.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Link: https://lore.kernel.org/r/20231229132642.1489088-1-pctammela@mojatatu.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
* net/sched: act_ct: Take per-cb reference to tcf_ct_flow_tableVlad Buslov2023-12-111-6/+28
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The referenced change added custom cleanup code to act_ct to delete any callbacks registered on the parent block when deleting the tcf_ct_flow_table instance. However, the underlying issue is that the drivers don't obtain the reference to the tcf_ct_flow_table instance when registering callbacks which means that not only driver callbacks may still be on the table when deleting it but also that the driver can still have pointers to its internal nf_flowtable and can use it concurrently which results either warning in netfilter[0] or use-after-free. Fix the issue by taking a reference to the underlying struct tcf_ct_flow_table instance when registering the callback and release the reference when unregistering. Expose new API required for such reference counting by adding two new callbacks to nf_flowtable_type and implementing them for act_ct flowtable_ct type. This fixes the issue by extending the lifetime of nf_flowtable until all users have unregistered. [0]: [106170.938634] ------------[ cut here ]------------ [106170.939111] WARNING: CPU: 21 PID: 3688 at include/net/netfilter/nf_flow_table.h:262 mlx5_tc_ct_del_ft_cb+0x267/0x2b0 [mlx5_core] [106170.940108] Modules linked in: act_ct nf_flow_table act_mirred act_skbedit act_tunnel_key vxlan cls_matchall nfnetlink_cttimeout act_gact cls_flower sch_ingress mlx5_vdpa vringh vhost_iotlb vdpa bonding openvswitch nsh rpcrdma rdma_ucm ib_iser libiscsi scsi_transport_iscsi ib_umad rdma_cm ib_ipoib iw_cm ib_cm mlx5_ib ib_uverbs ib_core xt_MASQUERADE nf_conntrack_netlink nfnetlink iptable_nat xt_addrtype xt_conntrack nf_nat br_netfilter rpcsec_gss_krb5 auth_rpcgss oid_regis try overlay mlx5_core [106170.943496] CPU: 21 PID: 3688 Comm: kworker/u48:0 Not tainted 6.6.0-rc7_for_upstream_min_debug_2023_11_01_13_02 #1 [106170.944361] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014 [106170.945292] Workqueue: mlx5e mlx5e_rep_neigh_update [mlx5_core] [106170.945846] RIP: 0010:mlx5_tc_ct_del_ft_cb+0x267/0x2b0 [mlx5_core] [106170.946413] Code: 89 ef 48 83 05 71 a4 14 00 01 e8 f4 06 04 e1 48 83 05 6c a4 14 00 01 48 83 c4 28 5b 5d 41 5c 41 5d c3 48 83 05 d1 8b 14 00 01 <0f> 0b 48 83 05 d7 8b 14 00 01 e9 96 fe ff ff 48 83 05 a2 90 14 00 [106170.947924] RSP: 0018:ffff88813ff0fcb8 EFLAGS: 00010202 [106170.948397] RAX: 0000000000000000 RBX: ffff88811eabac40 RCX: ffff88811eabad48 [106170.949040] RDX: ffff88811eab8000 RSI: ffffffffa02cd560 RDI: 0000000000000000 [106170.949679] RBP: ffff88811eab8000 R08: 0000000000000001 R09: ffffffffa0229700 [106170.950317] R10: ffff888103538fc0 R11: 0000000000000001 R12: ffff88811eabad58 [106170.950969] R13: ffff888110c01c00 R14: ffff888106b40000 R15: 0000000000000000 [106170.951616] FS: 0000000000000000(0000) GS:ffff88885fd40000(0000) knlGS:0000000000000000 [106170.952329] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [106170.952834] CR2: 00007f1cefd28cb0 CR3: 000000012181b006 CR4: 0000000000370ea0 [106170.953482] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [106170.954121] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [106170.954766] Call Trace: [106170.955057] <TASK> [106170.955315] ? __warn+0x79/0x120 [106170.955648] ? mlx5_tc_ct_del_ft_cb+0x267/0x2b0 [mlx5_core] [106170.956172] ? report_bug+0x17c/0x190 [106170.956537] ? handle_bug+0x3c/0x60 [106170.956891] ? exc_invalid_op+0x14/0x70 [106170.957264] ? asm_exc_invalid_op+0x16/0x20 [106170.957666] ? mlx5_del_flow_rules+0x10/0x310 [mlx5_core] [106170.958172] ? mlx5_tc_ct_block_flow_offload_add+0x1240/0x1240 [mlx5_core] [106170.958788] ? mlx5_tc_ct_del_ft_cb+0x267/0x2b0 [mlx5_core] [106170.959339] ? mlx5_tc_ct_del_ft_cb+0xc6/0x2b0 [mlx5_core] [106170.959854] ? mapping_remove+0x154/0x1d0 [mlx5_core] [106170.960342] ? mlx5e_tc_action_miss_mapping_put+0x4f/0x80 [mlx5_core] [106170.960927] mlx5_tc_ct_delete_flow+0x76/0xc0 [mlx5_core] [106170.961441] mlx5_free_flow_attr_actions+0x13b/0x220 [mlx5_core] [106170.962001] mlx5e_tc_del_fdb_flow+0x22c/0x3b0 [mlx5_core] [106170.962524] mlx5e_tc_del_flow+0x95/0x3c0 [mlx5_core] [106170.963034] mlx5e_flow_put+0x73/0xe0 [mlx5_core] [106170.963506] mlx5e_put_flow_list+0x38/0x70 [mlx5_core] [106170.964002] mlx5e_rep_update_flows+0xec/0x290 [mlx5_core] [106170.964525] mlx5e_rep_neigh_update+0x1da/0x310 [mlx5_core] [106170.965056] process_one_work+0x13a/0x2c0 [106170.965443] worker_thread+0x2e5/0x3f0 [106170.965808] ? rescuer_thread+0x410/0x410 [106170.966192] kthread+0xc6/0xf0 [106170.966515] ? kthread_complete_and_exit+0x20/0x20 [106170.966970] ret_from_fork+0x2d/0x50 [106170.967332] ? kthread_complete_and_exit+0x20/0x20 [106170.967774] ret_from_fork_asm+0x11/0x20 [106170.970466] </TASK> [106170.970726] ---[ end trace 0000000000000000 ]--- Fixes: 77ac5e40c44e ("net/sched: act_ct: remove and free nf_table callbacks") Signed-off-by: Vlad Buslov <vladbu@nvidia.com> Reviewed-by: Paul Blakey <paulb@nvidia.com> Acked-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: sched: do not offload flows with a helper in act_ctXin Long2023-11-161-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There is no hardware supporting ct helper offload. However, prior to this patch, a flower filter with a helper in the ct action can be successfully set into the HW, for example (eth1 is a bnxt NIC): # tc qdisc add dev eth1 ingress_block 22 ingress # tc filter add block 22 proto ip flower skip_sw ip_proto tcp \ dst_port 21 ct_state -trk action ct helper ipv4-tcp-ftp # tc filter show dev eth1 ingress filter block 22 protocol ip pref 49152 flower chain 0 handle 0x1 eth_type ipv4 ip_proto tcp dst_port 21 ct_state -trk skip_sw in_hw in_hw_count 1 <---- action order 1: ct zone 0 helper ipv4-tcp-ftp pipe index 2 ref 1 bind 1 used_hw_stats delayed This might cause the flower filter not to work as expected in the HW. This patch avoids this problem by simply returning -EOPNOTSUPP in tcf_ct_offload_act_setup() to not allow to offload flows with a helper in act_ct. Fixes: a21b06e73191 ("net: sched: add helper support in act_ct") Signed-off-by: Xin Long <lucien.xin@gmail.com> Reviewed-by: Jamal Hadi Salim <jhs@mojatatu.com> Link: https://lore.kernel.org/r/f8685ec7702c4a448a1371a8b34b43217b583b9d.1699898008.git.lucien.xin@gmail.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
* net/sched: act_ct: Always fill offloading tuple iifidxVlad Buslov2023-11-081-1/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | Referenced commit doesn't always set iifidx when offloading the flow to hardware. Fix the following cases: - nf_conn_act_ct_ext_fill() is called before extension is created with nf_conn_act_ct_ext_add() in tcf_ct_act(). This can cause rule offload with unspecified iifidx when connection is offloaded after only single original-direction packet has been processed by tc data path. Always fill the new nf_conn_act_ct_ext instance after creating it in nf_conn_act_ct_ext_add(). - Offloading of unidirectional UDP NEW connections is now supported, but ct flow iifidx field is not updated when connection is promoted to bidirectional which can result reply-direction iifidx to be zero when refreshing the connection. Fill in the extension and update flow iifidx before calling flow_offload_refresh(). Fixes: 9795ded7f924 ("net/sched: act_ct: Fill offloading tuple iifidx") Reviewed-by: Paul Blakey <paulb@nvidia.com> Signed-off-by: Vlad Buslov <vladbu@nvidia.com> Reviewed-by: Simon Horman <horms@kernel.org> Fixes: 6a9bad0069cf ("net/sched: act_ct: offload UDP NEW connections") Link: https://lore.kernel.org/r/20231103151410.764271-1-vladbu@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
* Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski2023-10-261-0/+9
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Cross-merge networking fixes after downstream PR. Conflicts: net/mac80211/rx.c 91535613b609 ("wifi: mac80211: don't drop all unprotected public action frames") 6c02fab72429 ("wifi: mac80211: split ieee80211_drop_unencrypted_mgmt() return value") Adjacent changes: drivers/net/ethernet/apm/xgene/xgene_enet_main.c 61471264c018 ("net: ethernet: apm: Convert to platform remove callback returning void") d2ca43f30611 ("net: xgene: Fix unused xgene_enet_of_match warning for !CONFIG_OF") net/vmw_vsock/virtio_transport.c 64c99d2d6ada ("vsock/virtio: support to send non-linear skb") 53b08c498515 ("vsock/virtio: initialize the_virtio_vsock before using VQs") Signed-off-by: Jakub Kicinski <kuba@kernel.org>
| * net/sched: act_ct: additional checks for outdated flowsVlad Buslov2023-10-251-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Current nf_flow_is_outdated() implementation considers any flow table flow which state diverged from its underlying CT connection status for teardown which can be problematic in the following cases: - Flow has never been offloaded to hardware in the first place either because flow table has hardware offload disabled (flag NF_FLOWTABLE_HW_OFFLOAD is not set) or because it is still pending on 'add' workqueue to be offloaded for the first time. The former is incorrect, the later generates excessive deletions and additions of flows. - Flow is already pending to be updated on the workqueue. Tearing down such flows will also generate excessive removals from the flow table, especially on highly loaded system where the latency to re-offload a flow via 'add' workqueue can be quite high. When considering a flow for teardown as outdated verify that it is both offloaded to hardware and doesn't have any pending updates. Fixes: 41f2c7c342d3 ("net/sched: act_ct: Fix promotion of offloaded unreplied tuple") Reviewed-by: Paul Blakey <paulb@nvidia.com> Signed-off-by: Vlad Buslov <vladbu@nvidia.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
| * netfilter: flowtable: GC pushes back packets to classic pathPablo Neira Ayuso2023-10-251-0/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Since 41f2c7c342d3 ("net/sched: act_ct: Fix promotion of offloaded unreplied tuple"), flowtable GC pushes back flows with IPS_SEEN_REPLY back to classic path in every run, ie. every second. This is because of a new check for NF_FLOW_HW_ESTABLISHED which is specific of sched/act_ct. In Netfilter's flowtable case, NF_FLOW_HW_ESTABLISHED never gets set on and IPS_SEEN_REPLY is unreliable since users decide when to offload the flow before, such bit might be set on at a later stage. Fix it by adding a custom .gc handler that sched/act_ct can use to deal with its NF_FLOW_HW_ESTABLISHED bit. Fixes: 41f2c7c342d3 ("net/sched: act_ct: Fix promotion of offloaded unreplied tuple") Reported-by: Vladimir Smelhaus <vl.sm@email.cz> Reviewed-by: Paul Blakey <paulb@nvidia.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* | sched: act_ct: switch to per-action label countingFlorian Westphal2023-10-251-23/+18
|/ | | | | | | | | | | | | | | | net->ct.labels_used was meant to convey 'number of ip/nftables rules that need the label extension allocated'. act_ct enables this for each net namespace, which voids all attempts to avoid ct->ext allocation when possible. Move this increment to the control plane to request label extension space allocation only when its needed. Signed-off-by: Florian Westphal <fw@strlen.de> Reviewed-by: Pedro Tammela <pctammela@mojatatu.com> Reviewed-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: sched: set IPS_CONFIRMED in tmpl status only when commit is set in act_ctXin Long2023-07-201-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | With the following flows, the packets will be dropped if OVS TC offload is enabled. 'ip,ct_state=-trk,in_port=1 actions=ct(zone=1)' 'ip,ct_state=+trk+new+rel,in_port=1 actions=ct(commit,zone=1)' 'ip,ct_state=+trk+new+rel,in_port=1 actions=ct(commit,zone=2),normal' In the 1st flow, it finds the exp from the hashtable and removes it then creates the ct with this exp in act_ct. However, in the 2nd flow it goes to the OVS upcall at the 1st time. When the skb comes back from userspace, it has to create the ct again without exp(the exp was removed last time). With no 'rel' set in the ct, the 3rd flow can never get matched. In OVS conntrack, it works around it by adding its own exp lookup function ovs_ct_expect_find() where it doesn't remove the exp. Instead of creating a real ct, it only updates its keys with the exp and its master info. So when the skb comes back, the exp is still in the hashtable. However, we can't do this trick in act_ct, as tc flower match is using a real ct, and passing the exp and its master info to flower parsing via tc_skb_cb is also not possible (tc_skb_cb size is not big enough). The simple and clear fix is to not remove the exp at the 1st flow, namely, not set IPS_CONFIRMED in tmpl when commit is not set in act_ct. Reported-by: Shuang Li <shuali@redhat.com> Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Aaron Conole <aconole@redhat.com> Reviewed-by: Davide Caratti <dcaratti@redhat.com> Acked-by: Florian Westphal <fw@strlen.de> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
* net/sched: act_ct: Fix promotion of offloaded unreplied tuplePaul Blakey2023-06-141-1/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently UNREPLIED and UNASSURED connections are added to the nf flow table. This causes the following connection packets to be processed by the flow table which then skips conntrack_in(), and thus such the connections will remain UNREPLIED and UNASSURED even if reply traffic is then seen. Even still, the unoffloaded reply packets are the ones triggering hardware update from new to established state, and if there aren't any to triger an update and/or previous update was missed, hardware can get out of sync with sw and still mark packets as new. Fix the above by: 1) Not skipping conntrack_in() for UNASSURED packets, but still refresh for hardware, as before the cited patch. 2) Try and force a refresh by reply-direction packets that update the hardware rules from new to established state. 3) Remove any bidirectional flows that didn't failed to update in hardware for re-insertion as bidrectional once any new packet arrives. Fixes: 6a9bad0069cf ("net/sched: act_ct: offload UDP NEW connections") Co-developed-by: Vlad Buslov <vladbu@nvidia.com> Signed-off-by: Vlad Buslov <vladbu@nvidia.com> Signed-off-by: Paul Blakey <paulb@nvidia.com> Reviewed-by: Florian Westphal <fw@strlen.de> Link: https://lore.kernel.org/r/1686313379-117663-1-git-send-email-paulb@nvidia.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
* net: extract nf_ct_handle_fragments to nf_conntrack_ovsXin Long2023-02-101-44/+2
| | | | | | | | | | | | | | | | Now handle_fragments() in OVS and TC have the similar code, and this patch removes the duplicate code by moving the function to nf_conntrack_ovs. Note that skb_clear_hash(skb) or skb->ignore_df = 1 should be done only when defrag returns 0, as it does in other places in kernel. Signed-off-by: Xin Long <lucien.xin@gmail.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Reviewed-by: Aaron Conole <aconole@redhat.com> Acked-by: Florian Westphal <fw@strlen.de> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
* net: sched: move frag check and tc_skb_cb update out of handle_fragmentsXin Long2023-02-101-32/+39
| | | | | | | | | | | This patch has no functional changes and just moves frag check and tc_skb_cb update out of handle_fragments, to make it easier to move the duplicate code from handle_fragments() into nf_conntrack_ovs later. Signed-off-by: Xin Long <lucien.xin@gmail.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Acked-by: Florian Westphal <fw@strlen.de> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
* net: extract nf_ct_skb_network_trim function to nf_conntrack_ovsXin Long2023-02-101-26/+1
| | | | | | | | | | | | There are almost the same code in ovs_skb_network_trim() and tcf_ct_skb_network_trim(), this patch extracts them into a function nf_ct_skb_network_trim() and moves the function to nf_conntrack_ovs. Signed-off-by: Xin Long <lucien.xin@gmail.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Reviewed-by: Aaron Conole <aconole@redhat.com> Acked-by: Florian Westphal <fw@strlen.de> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
* net/sched: act_ct: offload UDP NEW connectionsVlad Buslov2023-02-031-12/+39
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Modify the offload algorithm of UDP connections to the following: - Offload NEW connection as unidirectional. - When connection state changes to ESTABLISHED also update the hardware flow. However, in order to prevent act_ct from spamming offload add wq for every packet coming in reply direction in this state verify whether connection has already been updated to ESTABLISHED in the drivers. If that it the case, then skip flow_table and let conntrack handle such packets which will also allow conntrack to potentially promote the connection to ASSURED. - When connection state changes to ASSURED set the flow_table flow NF_FLOW_HW_BIDIRECTIONAL flag which will cause refresh mechanism to offload the reply direction. All other protocols have their offload algorithm preserved and are always offloaded as bidirectional. Note that this change tries to minimize the load on flow_table add workqueue. First, it tracks the last ctinfo that was offloaded by using new flow 'NF_FLOW_HW_ESTABLISHED' flag and doesn't schedule the refresh for reply direction packets when the offloads have already been updated with current ctinfo. Second, when 'add' task executes on workqueue it always update the offload with current flow state (by checking 'bidirectional' flow flag and obtaining actual ctinfo/cookie through meta action instead of caching any of these from the moment of scheduling the 'add' work) preventing the need from scheduling more updates if state changed concurrently while the 'add' work was pending on workqueue. Signed-off-by: Vlad Buslov <vladbu@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net/sched: act_ct: set ctinfo in meta action depending on ct stateVlad Buslov2023-02-031-2/+4
| | | | | | | | | | | | Currently tcf_ct_flow_table_fill_actions() function assumes that only established connections can be offloaded and always sets ctinfo to either IP_CT_ESTABLISHED or IP_CT_ESTABLISHED_REPLY strictly based on direction without checking actual connection state. To enable UDP NEW connection offload set the ctinfo, metadata cookie and NF_FLOW_HW_ESTABLISHED flow_offload flags bit based on ct->status value. Signed-off-by: Vlad Buslov <vladbu@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* netfilter: flowtable: cache info of last offloadVlad Buslov2023-02-031-5/+7
| | | | | | | | | | | | Modify flow table offload to cache the last ct info status that was passed to the driver offload callbacks by extending enum nf_flow_flags with new "NF_FLOW_HW_ESTABLISHED" flag. Set the flag if ctinfo was 'established' during last act_ct meta actions fill call. This infrastructure change is necessary to optimize promoting of UDP connections from 'new' to 'established' in following patches in this series. Signed-off-by: Vlad Buslov <vladbu@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: sched: use skb_ip_totlen and iph_totlenXin Long2023-02-011-1/+1
| | | | | | | | | | | | | There are 1 action and 1 qdisc that may process IPv4 TCP GSO packets and access iph->tot_len, replace them with skb_ip_totlen() and iph_totlen() accordingly. Note that we don't need to replace the one in tcf_csum_ipv4(), as it will return for TCP GSO packets in tcf_csum_ipv4_tcp(). Signed-off-by: Xin Long <lucien.xin@gmail.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
* net: move the nat function to nf_nat_ovs for ovs and tcXin Long2022-12-121-125/+11
| | | | | | | | | | | | | | | | | | | | | | There are two nat functions are nearly the same in both OVS and TC code, (ovs_)ct_nat_execute() and ovs_ct_nat/tcf_ct_act_nat(). This patch creates nf_nat_ovs.c under netfilter and moves them there then exports nf_ct_nat() so that it can be shared by both OVS and TC, and keeps the nat (type) check and nat flag update in OVS and TC's own place, as these parts are different between OVS and TC. Note that in OVS nat function it was using skb->protocol to get the proto as it already skips vlans in key_extract(), while it doesn't in TC, and TC has to call skb_protocol() to get proto. So in nf_ct_nat_execute(), we keep using skb_protocol() which works for both OVS and TC contrack. Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Aaron Conole <aconole@redhat.com> Acked-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: sched: update the nat flag for icmp error packets in ct_nat_executeXin Long2022-12-121-1/+1
| | | | | | | | | | | | | | In ovs_ct_nat_execute(), the packet flow key nat flags are updated when it processes ICMP(v6) error packets translation successfully. In ct_nat_execute() when processing ICMP(v6) error packets translation successfully, it should have done the same in ct_nat_execute() to set post_ct_s/dnat flag, which will be used to update flow key nat flags in OVS module later. Reviewed-by: Saeed Mahameed <saeed@kernel.org> Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net/sched: avoid indirect act functions on retpoline kernelsPedro Tammela2022-12-091-2/+3
| | | | | | | | | | Expose the necessary tc act functions and wire up act_api to use direct calls in retpoline kernels. Signed-off-by: Pedro Tammela <pctammela@mojatatu.com> Reviewed-by: Jamal Hadi Salim <jhs@mojatatu.com> Reviewed-by: Victor Nogueira <victor@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski2022-11-291-4/+4
|\ | | | | | | | | | | | | | | | | tools/lib/bpf/ringbuf.c 927cbb478adf ("libbpf: Handle size overflow for ringbuf mmap") b486d19a0ab0 ("libbpf: checkpatch: Fixed code alignments in ringbuf.c") https://lore.kernel.org/all/20221121122707.44d1446a@canb.auug.org.au/ Signed-off-by: Jakub Kicinski <kuba@kernel.org>
| * netfilter: conntrack: Fix data-races around ct markDaniel Xu2022-11-181-4/+4
| | | | | | | | | | | | | | | | | | | | nf_conn:mark can be read from and written to in parallel. Use READ_ONCE()/WRITE_ONCE() for reads and writes to prevent unwanted compiler optimizations. Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Daniel Xu <dxu@dxuuu.xyz> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* | net: sched: add helper support in act_ctXin Long2022-11-081-8/+81
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch is to add helper support in act_ct for OVS actions=ct(alg=xxx) offloading, which is corresponding to Commit cae3a2627520 ("openvswitch: Allow attaching helpers to ct action") in OVS kernel part. The difference is when adding TC actions family and proto cannot be got from the filter/match, other than helper name in tb[TCA_CT_HELPER_NAME], we also need to send the family in tb[TCA_CT_HELPER_FAMILY] and the proto in tb[TCA_CT_HELPER_PROTO] to kernel. Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
* | net: sched: call tcf_ct_params_free to free params in tcf_ct_initXin Long2022-11-081-17/+18
|/ | | | | | | | | | This patch is to make the err path simple by calling tcf_ct_params_free(), so that it won't cause problems when more members are added into param and need freeing on the err path. Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
* Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski2022-09-291-1/+4
|\ | | | | | | | | | | No conflicts. Signed-off-by: Jakub Kicinski <kuba@kernel.org>
| * net: sched: act_ct: fix possible refcount leak in tcf_ct_init()Hangyu Hua2022-09-261-1/+4
| | | | | | | | | | | | | | | | | | | | nf_ct_put need to be called to put the refcount got by tcf_ct_fill_params to avoid possible refcount leak when tcf_ct_flow_table_get fails. Fixes: c34b961a2492 ("net/sched: act_ct: Create nf flow table per zone") Signed-off-by: Hangyu Hua <hbh25y@gmail.com> Link: https://lore.kernel.org/r/20220923020046.8021-1-hbh25y@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
* | net: sched: act_ct: remove redundant variable errJinpeng Cui2022-09-211-4/+1
| | | | | | | | | | | | | | | | | | Return value directly from pskb_trim_rcsum() instead of getting value from redundant variable err. Reported-by: Zeal Robot <zealci@zte.com.cn> Signed-off-by: Jinpeng Cui <cui.jinpeng2@zte.com.cn> Signed-off-by: David S. Miller <davem@davemloft.net>
* | net: sched: act_ct: get rid of tcf_ct_walker and tcf_ct_searchZhengchao Shao2022-09-091-19/+0
| | | | | | | | | | | | | | | | | | tcf_ct_walker() and tcf_ct_search() do the same thing as generic walk/search function, so remove them. Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | net: sched: act: move global static variable net_id to tc_action_opsZhengchao Shao2022-09-091-9/+8
|/ | | | | | | | | Each tc action module has a corresponding net_id, so put net_id directly into the structure tc_action_ops. Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net/sched: act_ct: set 'net' pointer when creating new nf_flow_tableVlad Buslov2022-07-111-2/+3
| | | | | | | | | Following patches in series use the pointer to access flow table offload debug variables. Signed-off-by: Vlad Buslov <vladbu@nvidia.com> Signed-off-by: Oz Shlomo <ozsh@nvidia.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* net/sched: act_api: fix error code in tcf_ct_flow_table_fill_tuple_ipv6()Dan Carpenter2022-06-011-1/+1
| | | | | | | | | | | | The tcf_ct_flow_table_fill_tuple_ipv6() function is supposed to return false on failure. It should not return negatives because that means succes/true. Fixes: fcb6aa86532c ("act_ct: Support GRE offload") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Acked-by: Toshiaki Makita <toshiaki.makita1@gmail.com> Link: https://lore.kernel.org/r/YpYFnbDxFl6tQ3Bn@kili Signed-off-by: Paolo Abeni <pabeni@redhat.com>
* net/sched: act_api: Add extack to offload_act_setup() callbackIdo Schimmel2022-04-081-1/+2
| | | | | | | | | | | The callback is used by various actions to populate the flow action structure prior to offload. Pass extack to this callback so that the various actions will be able to report accurate error messages to user space. Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net/sched: act_ct: fix ref leak when switching zonesMarcelo Ricardo Leitner2022-03-261-6/+9
| | | | | | | | | | | | | | | When switching zones or network namespaces without doing a ct clear in between, it is now leaking a reference to the old ct entry. That's because tcf_ct_skb_nfct_cached() returns false and tcf_ct_flow_table_lookup() may simply overwrite it. The fix is to, as the ct entry is not reusable, free it already at tcf_ct_skb_nfct_cached(). Reported-by: Florian Westphal <fw@strlen.de> Fixes: 2f131de361f6 ("net/sched: act_ct: Fix flow table lookup after ct clear or switching zones") Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* Merge git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-nextJakub Kicinski2022-03-151-24/+91
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pablo Neira Ayuso says: ==================== Netfilter updates for net-next 1) Revert CHECKSUM_UNNECESSARY for UDP packet from conntrack. 2) Reject unsupported families when creating tables, from Phil Sutter. 3) GRE support for the flowtable, from Toshiaki Makita. 4) Add GRE offload support for act_ct, also from Toshiaki. 5) Update mlx5 driver to support for GRE flowtable offload, from Toshiaki Makita. 6) Oneliner to clean up incorrect indentation in nf_conntrack_bridge, from Jiapeng Chong. * git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next: netfilter: bridge: clean up some inconsistent indenting net/mlx5: Support GRE conntrack offload act_ct: Support GRE offload netfilter: flowtable: Support GRE netfilter: nf_tables: Reject tables of unsupported family Revert "netfilter: conntrack: mark UDP zero checksum as CHECKSUM_UNNECESSARY" ==================== Link: https://lore.kernel.org/r/20220315091513.66544-1-pablo@netfilter.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
| * act_ct: Support GRE offloadToshiaki Makita2022-03-031-24/+91
| | | | | | | | | | | | | | | | Support GREv0 without NAT. Signed-off-by: Toshiaki Makita <toshiaki.makita1@gmail.com> Acked-by: Paul Blakey <paulb@nvidia.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* | Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski2022-03-031-4/+9
|\ \ | |/ |/| | | | | | | | | | | | | | | | | | | | | | | net/batman-adv/hard-interface.c commit 690bb6fb64f5 ("batman-adv: Request iflink once in batadv-on-batadv check") commit 6ee3c393eeb7 ("batman-adv: Demote batadv-on-batadv skip error message") https://lore.kernel.org/all/20220302163049.101957-1-sw@simonwunderlich.de/ net/smc/af_smc.c commit 4d08b7b57ece ("net/smc: Fix cleanup when register ULP fails") commit 462791bbfa35 ("net/smc: add sysctl interface for SMC") https://lore.kernel.org/all/20220302112209.355def40@canb.auug.org.au/ Signed-off-by: Jakub Kicinski <kuba@kernel.org>
| * net/sched: act_ct: Fix flow table lookup failure with no originating ifindexPaul Blakey2022-03-011-4/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | After cited commit optimizted hw insertion, flow table entries are populated with ifindex information which was intended to only be used for HW offload. This tuple ifindex is hashed in the flow table key, so it must be filled for lookup to be successful. But tuple ifindex is only relevant for the netfilter flowtables (nft), so it's not filled in act_ct flow table lookup, resulting in lookup failure, and no SW offload and no offload teardown for TCP connection FIN/RST packets. To fix this, add new tc ifindex field to tuple, which will only be used for offloading, not for lookup, as it will not be part of the tuple hash. Fixes: 9795ded7f924 ("net/sched: act_ct: Fill offloading tuple iifidx") Signed-off-by: Paul Blakey <paulb@nvidia.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* | Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski2022-02-241-5/+0
|\| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | tools/testing/selftests/net/mptcp/mptcp_join.sh 34aa6e3bccd8 ("selftests: mptcp: add ip mptcp wrappers") 857898eb4b28 ("selftests: mptcp: add missing join check") 6ef84b1517e0 ("selftests: mptcp: more robust signal race test") https://lore.kernel.org/all/20220221131842.468893-1-broonie@kernel.org/ drivers/net/ethernet/mellanox/mlx5/core/en/tc/act/act.h drivers/net/ethernet/mellanox/mlx5/core/en/tc/act/ct.c fb7e76ea3f3b6 ("net/mlx5e: TC, Skip redundant ct clear actions") c63741b426e11 ("net/mlx5e: Fix MPLSoUDP encap to use MPLS action information") 09bf97923224f ("net/mlx5e: TC, Move pedit_headers_action to parse_attr") 84ba8062e383 ("net/mlx5e: Test CT and SAMPLE on flow attr") efe6f961cd2e ("net/mlx5e: CT, Don't set flow flag CT for ct clear flow") 3b49a7edec1d ("net/mlx5e: TC, Reject rules with multiple CT actions") Signed-off-by: Jakub Kicinski <kuba@kernel.org>
| * net/sched: act_ct: Fix flow table lookup after ct clear or switching zonesPaul Blakey2022-02-181-5/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Flow table lookup is skipped if packet either went through ct clear action (which set the IP_CT_UNTRACKED flag on the packet), or while switching zones and there is already a connection associated with the packet. This will result in no SW offload of the connection, and the and connection not being removed from flow table with TCP teardown (fin/rst packet). To fix the above, remove these unneccary checks in flow table lookup. Fixes: 46475bb20f4b ("net/sched: act_ct: Software offload of established flows") Signed-off-by: Paul Blakey <paulb@nvidia.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | netfilter: conntrack: remove extension register apiFlorian Westphal2022-02-041-11/+0
| | | | | | | | | | | | | | | | These no longer register/unregister a meaningful structure so remove it. Cc: Paul Blakey <paulb@nvidia.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* | netfilter: conntrack: move extension sizes into coreFlorian Westphal2022-02-041-1/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | No need to specify this in the registration modules, we already collect all sizes for build-time checks on the maximum combined size. After this change, all extensions except nat have no meaningful content in their nf_ct_ext_type struct definition. Next patch handles nat, this will then allow to remove the dynamic register api completely. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>