summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* sock: annotate data-races around prot->memory_pressureEric Dumazet2023-08-182-4/+5
| | | | | | | | | | | | | | | *prot->memory_pressure is read/writen locklessly, we need to add proper annotations. A recent commit added a new race, it is time to audit all accesses. Fixes: 2d0c88e84e48 ("sock: Fix misuse of sk_under_memory_pressure()") Fixes: 4d93df0abd50 ("[SCTP]: Rewrite of sctp buffer management code") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Abel Wu <wuyun.abel@bytedance.com> Reviewed-by: Shakeel Butt <shakeelb@google.com> Link: https://lore.kernel.org/r/20230818015132.2699348-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
* net: dsa: felix: fix oversize frame dropping for always closed tc-taprio gatesVladimir Oltean2023-08-181-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The blamed commit resolved a bug where frames would still get stuck at egress, even though they're smaller than the maxSDU[tc], because the driver did not take into account the extra 33 ns that the queue system needs for scheduling the frame. It now takes that into account, but the arithmetic that we perform in vsc9959_tas_remaining_gate_len_ps() is buggy, because we operate on 64-bit unsigned integers, so gate_len_ns - VSC9959_TAS_MIN_GATE_LEN_NS may become a very large integer if gate_len_ns < 33 ns. In practice, this means that we've introduced a regression where all traffic class gates which are permanently closed will not get detected by the driver, and we won't enable oversize frame dropping for them. Before: mscc_felix 0000:00:00.5: port 0: max frame size 1526 needs 12400000 ps, 1152000 ps for mPackets at speed 1000 mscc_felix 0000:00:00.5: port 0 tc 0 min gate len 1000000, sending all frames mscc_felix 0000:00:00.5: port 0 tc 1 min gate len 0, sending all frames mscc_felix 0000:00:00.5: port 0 tc 2 min gate len 0, sending all frames mscc_felix 0000:00:00.5: port 0 tc 3 min gate len 0, sending all frames mscc_felix 0000:00:00.5: port 0 tc 4 min gate len 0, sending all frames mscc_felix 0000:00:00.5: port 0 tc 5 min gate len 0, sending all frames mscc_felix 0000:00:00.5: port 0 tc 6 min gate len 0, sending all frames mscc_felix 0000:00:00.5: port 0 tc 7 min gate length 5120 ns not enough for max frame size 1526 at 1000 Mbps, dropping frames over 615 octets including FCS After: mscc_felix 0000:00:00.5: port 0: max frame size 1526 needs 12400000 ps, 1152000 ps for mPackets at speed 1000 mscc_felix 0000:00:00.5: port 0 tc 0 min gate len 1000000, sending all frames mscc_felix 0000:00:00.5: port 0 tc 1 min gate length 0 ns not enough for max frame size 1526 at 1000 Mbps, dropping frames over 1 octets including FCS mscc_felix 0000:00:00.5: port 0 tc 2 min gate length 0 ns not enough for max frame size 1526 at 1000 Mbps, dropping frames over 1 octets including FCS mscc_felix 0000:00:00.5: port 0 tc 3 min gate length 0 ns not enough for max frame size 1526 at 1000 Mbps, dropping frames over 1 octets including FCS mscc_felix 0000:00:00.5: port 0 tc 4 min gate length 0 ns not enough for max frame size 1526 at 1000 Mbps, dropping frames over 1 octets including FCS mscc_felix 0000:00:00.5: port 0 tc 5 min gate length 0 ns not enough for max frame size 1526 at 1000 Mbps, dropping frames over 1 octets including FCS mscc_felix 0000:00:00.5: port 0 tc 6 min gate length 0 ns not enough for max frame size 1526 at 1000 Mbps, dropping frames over 1 octets including FCS mscc_felix 0000:00:00.5: port 0 tc 7 min gate length 5120 ns not enough for max frame size 1526 at 1000 Mbps, dropping frames over 615 octets including FCS Fixes: 11afdc6526de ("net: dsa: felix: tc-taprio intervals smaller than MTU should send at least one packet") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/20230817120111.3522827-1-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
* devlink: add missing unregister linecard notificationJiri Pirko2023-08-181-0/+3
| | | | | | | | | | | Cited fixes commit introduced linecard notifications for register, however it didn't add them for unregister. Fix that by adding them. Fixes: c246f9b5fd61 ("devlink: add support to create line card and expose to user") Signed-off-by: Jiri Pirko <jiri@nvidia.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/20230817125240.2144794-1-jiri@resnulli.us Signed-off-by: Jakub Kicinski <kuba@kernel.org>
* octeontx2-af: SDP: fix receive link configHariprasad Kelam2023-08-181-1/+2
| | | | | | | | | | | | | | | On SDP interfaces, frame oversize and undersize errors are observed as driver is not considering packet sizes of all subscribers of the link before updating the link config. This patch fixes the same. Fixes: 9b7dd87ac071 ("octeontx2-af: Support to modify min/max allowed packet lengths") Signed-off-by: Hariprasad Kelam <hkelam@marvell.com> Signed-off-by: Sunil Goutham <sgoutham@marvell.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Link: https://lore.kernel.org/r/20230817063006.10366-1-hkelam@marvell.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
* Merge tag 'batadv-net-pullrequest-20230816' of ↵Jakub Kicinski2023-08-186-5/+29
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.open-mesh.org/linux-merge Simon Wunderlich says: ==================== Here are some batman-adv bugfixes: - Fix issues with adjusted MTUs (2 patches), by Sven Eckelmann - Fix header access for memory reallocation case, by Remi Pommarel - Fix two memory leaks (2 patches), by Remi Pommarel * tag 'batadv-net-pullrequest-20230816' of git://git.open-mesh.org/linux-merge: batman-adv: Fix batadv_v_ogm_aggr_send memory leak batman-adv: Fix TT global entry leak when client roamed back batman-adv: Do not get eth header before batadv_check_management_packet batman-adv: Don't increase MTU when set by user batman-adv: Trigger events for auto adjusted MTU ==================== Link: https://lore.kernel.org/r/20230816163318.189996-1-sw@simonwunderlich.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
| * batman-adv: Fix batadv_v_ogm_aggr_send memory leakRemi Pommarel2023-08-091-1/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When batadv_v_ogm_aggr_send is called for an inactive interface, the skb is silently dropped by batadv_v_ogm_send_to_if() but never freed causing the following memory leak: unreferenced object 0xffff00000c164800 (size 512): comm "kworker/u8:1", pid 2648, jiffies 4295122303 (age 97.656s) hex dump (first 32 bytes): 00 80 af 09 00 00 ff ff e1 09 00 00 75 01 60 83 ............u.`. 1f 00 00 00 b8 00 00 00 15 00 05 00 da e3 d3 64 ...............d backtrace: [<0000000007ad20f6>] __kmalloc_track_caller+0x1a8/0x310 [<00000000d1029e55>] kmalloc_reserve.constprop.0+0x70/0x13c [<000000008b9d4183>] __alloc_skb+0xec/0x1fc [<00000000c7af5051>] __netdev_alloc_skb+0x48/0x23c [<00000000642ee5f5>] batadv_v_ogm_aggr_send+0x50/0x36c [<0000000088660bd7>] batadv_v_ogm_aggr_work+0x24/0x40 [<0000000042fc2606>] process_one_work+0x3b0/0x610 [<000000002f2a0b1c>] worker_thread+0xa0/0x690 [<0000000059fae5d4>] kthread+0x1fc/0x210 [<000000000c587d3a>] ret_from_fork+0x10/0x20 Free the skb in that case to fix this leak. Cc: stable@vger.kernel.org Fixes: 0da0035942d4 ("batman-adv: OGMv2 - add basic infrastructure") Signed-off-by: Remi Pommarel <repk@triplefau.lt> Signed-off-by: Sven Eckelmann <sven@narfation.org> Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
| * batman-adv: Fix TT global entry leak when client roamed backRemi Pommarel2023-08-051-1/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When a client roamed back to a node before it got time to destroy the pending local entry (i.e. within the same originator interval) the old global one is directly removed from hash table and left as such. But because this entry had an extra reference taken at lookup (i.e using batadv_tt_global_hash_find) there is no way its memory will be reclaimed at any time causing the following memory leak: unreferenced object 0xffff0000073c8000 (size 18560): comm "softirq", pid 0, jiffies 4294907738 (age 228.644s) hex dump (first 32 bytes): 06 31 ac 12 c7 7a 05 00 01 00 00 00 00 00 00 00 .1...z.......... 2c ad be 08 00 80 ff ff 6c b6 be 08 00 80 ff ff ,.......l....... backtrace: [<00000000ee6e0ffa>] kmem_cache_alloc+0x1b4/0x300 [<000000000ff2fdbc>] batadv_tt_global_add+0x700/0xe20 [<00000000443897c7>] _batadv_tt_update_changes+0x21c/0x790 [<000000005dd90463>] batadv_tt_update_changes+0x3c/0x110 [<00000000a2d7fc57>] batadv_tt_tvlv_unicast_handler_v1+0xafc/0xe10 [<0000000011793f2a>] batadv_tvlv_containers_process+0x168/0x2b0 [<00000000b7cbe2ef>] batadv_recv_unicast_tvlv+0xec/0x1f4 [<0000000042aef1d8>] batadv_batman_skb_recv+0x25c/0x3a0 [<00000000bbd8b0a2>] __netif_receive_skb_core.isra.0+0x7a8/0xe90 [<000000004033d428>] __netif_receive_skb_one_core+0x64/0x74 [<000000000f39a009>] __netif_receive_skb+0x48/0xe0 [<00000000f2cd8888>] process_backlog+0x174/0x344 [<00000000507d6564>] __napi_poll+0x58/0x1f4 [<00000000b64ef9eb>] net_rx_action+0x504/0x590 [<00000000056fa5e4>] _stext+0x1b8/0x418 [<00000000878879d6>] run_ksoftirqd+0x74/0xa4 unreferenced object 0xffff00000bae1a80 (size 56): comm "softirq", pid 0, jiffies 4294910888 (age 216.092s) hex dump (first 32 bytes): 00 78 b1 0b 00 00 ff ff 0d 50 00 00 00 00 00 00 .x.......P...... 00 00 00 00 00 00 00 00 50 c8 3c 07 00 00 ff ff ........P.<..... backtrace: [<00000000ee6e0ffa>] kmem_cache_alloc+0x1b4/0x300 [<00000000d9aaa49e>] batadv_tt_global_add+0x53c/0xe20 [<00000000443897c7>] _batadv_tt_update_changes+0x21c/0x790 [<000000005dd90463>] batadv_tt_update_changes+0x3c/0x110 [<00000000a2d7fc57>] batadv_tt_tvlv_unicast_handler_v1+0xafc/0xe10 [<0000000011793f2a>] batadv_tvlv_containers_process+0x168/0x2b0 [<00000000b7cbe2ef>] batadv_recv_unicast_tvlv+0xec/0x1f4 [<0000000042aef1d8>] batadv_batman_skb_recv+0x25c/0x3a0 [<00000000bbd8b0a2>] __netif_receive_skb_core.isra.0+0x7a8/0xe90 [<000000004033d428>] __netif_receive_skb_one_core+0x64/0x74 [<000000000f39a009>] __netif_receive_skb+0x48/0xe0 [<00000000f2cd8888>] process_backlog+0x174/0x344 [<00000000507d6564>] __napi_poll+0x58/0x1f4 [<00000000b64ef9eb>] net_rx_action+0x504/0x590 [<00000000056fa5e4>] _stext+0x1b8/0x418 [<00000000878879d6>] run_ksoftirqd+0x74/0xa4 Releasing the extra reference from batadv_tt_global_hash_find even at roam back when batadv_tt_global_free is called fixes this memory leak. Cc: stable@vger.kernel.org Fixes: 068ee6e204e1 ("batman-adv: roaming handling mechanism redesign") Signed-off-by: Remi Pommarel <repk@triplefau.lt> Signed-off-by; Sven Eckelmann <sven@narfation.org> Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
| * batman-adv: Do not get eth header before batadv_check_management_packetRemi Pommarel2023-07-282-2/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If received skb in batadv_v_elp_packet_recv or batadv_v_ogm_packet_recv is either cloned or non linearized then its data buffer will be reallocated by batadv_check_management_packet when skb_cow or skb_linearize get called. Thus geting ethernet header address inside skb data buffer before batadv_check_management_packet had any chance to reallocate it could lead to the following kernel panic: Unable to handle kernel paging request at virtual address ffffff8020ab069a Mem abort info: ESR = 0x96000007 EC = 0x25: DABT (current EL), IL = 32 bits SET = 0, FnV = 0 EA = 0, S1PTW = 0 FSC = 0x07: level 3 translation fault Data abort info: ISV = 0, ISS = 0x00000007 CM = 0, WnR = 0 swapper pgtable: 4k pages, 39-bit VAs, pgdp=0000000040f45000 [ffffff8020ab069a] pgd=180000007fffa003, p4d=180000007fffa003, pud=180000007fffa003, pmd=180000007fefe003, pte=0068000020ab0706 Internal error: Oops: 96000007 [#1] SMP Modules linked in: ahci_mvebu libahci_platform libahci dvb_usb_af9035 dvb_usb_dib0700 dib0070 dib7000m dibx000_common ath11k_pci ath10k_pci ath10k_core mwl8k_new nf_nat_sip nf_conntrack_sip xhci_plat_hcd xhci_hcd nf_nat_pptp nf_conntrack_pptp at24 sbsa_gwdt CPU: 1 PID: 16 Comm: ksoftirqd/1 Not tainted 5.15.42-00066-g3242268d425c-dirty #550 Hardware name: A8k (DT) pstate: 60000005 (nZCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--) pc : batadv_is_my_mac+0x60/0xc0 lr : batadv_v_ogm_packet_recv+0x98/0x5d0 sp : ffffff8000183820 x29: ffffff8000183820 x28: 0000000000000001 x27: ffffff8014f9af00 x26: 0000000000000000 x25: 0000000000000543 x24: 0000000000000003 x23: ffffff8020ab0580 x22: 0000000000000110 x21: ffffff80168ae880 x20: 0000000000000000 x19: ffffff800b561000 x18: 0000000000000000 x17: 0000000000000000 x16: 0000000000000000 x15: 00dc098924ae0032 x14: 0f0405433e0054b0 x13: ffffffff00000080 x12: 0000004000000001 x11: 0000000000000000 x10: 0000000000000000 x9 : 0000000000000000 x8 : 0000000000000000 x7 : ffffffc076dae000 x6 : ffffff8000183700 x5 : ffffffc00955e698 x4 : ffffff80168ae000 x3 : ffffff80059cf000 x2 : ffffff800b561000 x1 : ffffff8020ab0696 x0 : ffffff80168ae880 Call trace: batadv_is_my_mac+0x60/0xc0 batadv_v_ogm_packet_recv+0x98/0x5d0 batadv_batman_skb_recv+0x1b8/0x244 __netif_receive_skb_core.isra.0+0x440/0xc74 __netif_receive_skb_one_core+0x14/0x20 netif_receive_skb+0x68/0x140 br_pass_frame_up+0x70/0x80 br_handle_frame_finish+0x108/0x284 br_handle_frame+0x190/0x250 __netif_receive_skb_core.isra.0+0x240/0xc74 __netif_receive_skb_list_core+0x6c/0x90 netif_receive_skb_list_internal+0x1f4/0x310 napi_complete_done+0x64/0x1d0 gro_cell_poll+0x7c/0xa0 __napi_poll+0x34/0x174 net_rx_action+0xf8/0x2a0 _stext+0x12c/0x2ac run_ksoftirqd+0x4c/0x7c smpboot_thread_fn+0x120/0x210 kthread+0x140/0x150 ret_from_fork+0x10/0x20 Code: f9403844 eb03009f 54fffee1 f94 Thus ethernet header address should only be fetched after batadv_check_management_packet has been called. Fixes: 0da0035942d4 ("batman-adv: OGMv2 - add basic infrastructure") Cc: stable@vger.kernel.org Signed-off-by: Remi Pommarel <repk@triplefau.lt> Signed-off-by: Sven Eckelmann <sven@narfation.org> Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
| * batman-adv: Don't increase MTU when set by userSven Eckelmann2023-07-203-1/+22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If the user set an MTU value, it usually means that there are special requirements for the MTU. But if an interface gots activated, the MTU was always recalculated and then the user set value was overwritten. The only reason why this user set value has to be overwritten, is when the MTU has to be decreased because batman-adv is not able to transfer packets with the user specified size. Fixes: c6c8fea29769 ("net: Add batman-adv meshing protocol") Cc: stable@vger.kernel.org Signed-off-by: Sven Eckelmann <sven@narfation.org> Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
| * batman-adv: Trigger events for auto adjusted MTUSven Eckelmann2023-07-201-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If an interface changes the MTU, it is expected that an NETDEV_PRECHANGEMTU and NETDEV_CHANGEMTU notification events is triggered. This worked fine for .ndo_change_mtu based changes because core networking code took care of it. But for auto-adjustments after hard-interfaces changes, these events were simply missing. Due to this problem, non-batman-adv components weren't aware of MTU changes and thus couldn't perform their own tasks correctly. Fixes: c6c8fea29769 ("net: Add batman-adv meshing protocol") Cc: stable@vger.kernel.org Signed-off-by: Sven Eckelmann <sven@narfation.org> Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
* | Merge tag 'net-6.5-rc7' of ↵Linus Torvalds2023-08-1848-118/+313
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Jakub Kicinski: "Including fixes from ipsec and netfilter. No known outstanding regressions. Fixes to fixes: - virtio-net: set queues after driver_ok, avoid a potential race added by recent fix - Revert "vlan: Fix VLAN 0 memory leak", it may lead to a warning when VLAN 0 is registered explicitly - nf_tables: - fix false-positive lockdep splat in recent fixes - don't fail inserts if duplicate has expired (fix test failures) - fix races between garbage collection and netns dismantle Current release - new code bugs: - mlx5: Fix mlx5_cmd_update_root_ft() error flow Previous releases - regressions: - phy: fix IRQ-based wake-on-lan over hibernate / power off Previous releases - always broken: - sock: fix misuse of sk_under_memory_pressure() preventing system from exiting global TCP memory pressure if a single cgroup is under pressure - fix the RTO timer retransmitting skb every 1ms if linear option is enabled - af_key: fix sadb_x_filter validation, amment netlink policy - ipsec: fix slab-use-after-free in decode_session6() - macb: in ZynqMP resume always configure PS GTR for non-wakeup source Misc: - netfilter: set default timeout to 3 secs for sctp shutdown send and recv state (from 300ms), align with protocol timers" * tag 'net-6.5-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (49 commits) ice: Block switchdev mode when ADQ is active and vice versa qede: fix firmware halt over suspend and resume net: do not allow gso_size to be set to GSO_BY_FRAGS sock: Fix misuse of sk_under_memory_pressure() sfc: don't fail probe if MAE/TC setup fails sfc: don't unregister flow_indr if it was never registered net: dsa: mv88e6xxx: Wait for EEPROM done before HW reset net/mlx5: Fix mlx5_cmd_update_root_ft() error flow net/mlx5e: XDP, Fix fifo overrun on XDP_REDIRECT i40e: fix misleading debug logs iavf: fix FDIR rule fields masks validation ipv6: fix indentation of a config attribute mailmap: add entries for Simon Horman broadcom: b44: Use b44_writephy() return value net: openvswitch: reject negative ifindex team: Fix incorrect deletion of ETH_P_8021AD protocol vid from slaves net: phy: broadcom: stub c45 read/write for 54810 netfilter: nft_dynset: disallow object maps netfilter: nf_tables: GC transaction race with netns dismantle netfilter: nf_tables: fix GC transaction races with netns and netlink event exit path ...
| * \ Merge branch '40GbE' of ↵Jakub Kicinski2023-08-174-12/+93
| |\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue Tony Nguyen says: ==================== Intel Wired LAN Driver Updates 2023-08-16 (iavf, i40e) This series contains updates to iavf and i40e drivers. Piotr adds checks for unsupported Flow Director rules on iavf. Andrii replaces incorrect 'write' messaging on read operations for i40e. * '40GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue: i40e: fix misleading debug logs iavf: fix FDIR rule fields masks validation ==================== Link: https://lore.kernel.org/r/20230816193308.1307535-1-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
| | * | i40e: fix misleading debug logsAndrii Staikov2023-08-161-8/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Change "write" into the actual "read" word. Change parameters description. Fixes: 7073f46e443e ("i40e: Add AQ commands for NVM Update for X722") Signed-off-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Signed-off-by: Andrii Staikov <andrii.staikov@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
| | * | iavf: fix FDIR rule fields masks validationPiotr Gardocki2023-08-163-4/+85
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Return an error if a field's mask is neither full nor empty. When a mask is only partial the field is not being used for rule programming but it gives a wrong impression it is used. Fix by returning an error on any partial mask to make it clear they are not supported. The ip_ver assignment is moved earlier in code to allow using it in iavf_validate_fdir_fltr_masks. Fixes: 527691bf0682 ("iavf: Support IPv4 Flow Director filters") Fixes: e90cbc257a6f ("iavf: Support IPv6 Flow Director filters") Signed-off-by: Piotr Gardocki <piotrx.gardocki@intel.com> Tested-by: Rafal Romanowski <rafal.romanowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
| * | | Merge tag 'mlx5-fixes-2023-08-16' of ↵Jakub Kicinski2023-08-173-4/+16
| |\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux Saeed Mahameed says: ==================== mlx5 fixes 2023-08-16 This series provides bug fixes to mlx5 driver. * tag 'mlx5-fixes-2023-08-16' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux: net/mlx5: Fix mlx5_cmd_update_root_ft() error flow net/mlx5e: XDP, Fix fifo overrun on XDP_REDIRECT ==================== Link: https://lore.kernel.org/r/20230816204108.53819-1-saeed@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
| | * | | net/mlx5: Fix mlx5_cmd_update_root_ft() error flowShay Drory2023-08-161-1/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The cited patch change mlx5_cmd_update_root_ft() to work with multiple peer devices. However, it didn't align the error flow as well. Hence, Fix the error code to work with multiple peer devices. Fixes: 222dd185833e ("{net/RDMA}/mlx5: introduce lag_for_each_peer") Signed-off-by: Shay Drory <shayd@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
| | * | | net/mlx5e: XDP, Fix fifo overrun on XDP_REDIRECTDragos Tatulea2023-08-162-3/+7
| | |/ / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Before this fix, running high rate traffic through XDP_REDIRECT with multibuf could overrun the fifo used to release the xdp frames after tx completion. This resulted in corrupted data being consumed on the free side. The culplirt was a miscalculation of the fifo size: the maximum ratio between fifo entries / data segments was incorrect. This ratio serves to calculate the max fifo size for a full sq where each packet uses the worst case number of entries in the fifo. This patch fixes the formula and names the constant. It also makes sure that future values will use a power of 2 number of entries for the fifo mask to work. Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com> Fixes: 3f734b8c594b ("net/mlx5e: XDP, Use multiple single-entry objects in xdpi_fifo") Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
| * | | ice: Block switchdev mode when ADQ is active and vice versaMarcin Szycik2023-08-172-0/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | ADQ and switchdev are not supported simultaneously. Enabling both at the same time can result in nullptr dereference. To prevent this, check if ADQ is active when changing devlink mode to switchdev mode, and check if switchdev is active when enabling ADQ. Fixes: fbc7b27af0f9 ("ice: enable ndo_setup_tc support for mqprio_qdisc") Signed-off-by: Marcin Szycik <marcin.szycik@linux.intel.com> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Tested-by: Sujai Buvaneswaran <sujai.buvaneswaran@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/20230816193405.1307580-1-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
| * | | qede: fix firmware halt over suspend and resumeManish Chopra2023-08-171-0/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | While performing certain power-off sequences, PCI drivers are called to suspend and resume their underlying devices through PCI PM (power management) interface. However this NIC hardware does not support PCI PM suspend/resume operations so system wide suspend/resume leads to bad MFW (management firmware) state which causes various follow-up errors in driver when communicating with the device/firmware afterwards. To fix this driver implements PCI PM suspend handler to indicate unsupported operation to the PCI subsystem explicitly, thus avoiding system to go into suspended/standby mode. Without this fix device/firmware does not recover unless system is power cycled. Fixes: 2950219d87b0 ("qede: Add basic network device support") Signed-off-by: Manish Chopra <manishc@marvell.com> Signed-off-by: Alok Prasad <palok@marvell.com> Reviewed-by: John Meneghini <jmeneghi@redhat.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/20230816150711.59035-1-manishc@marvell.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
| * | | net: do not allow gso_size to be set to GSO_BY_FRAGSEric Dumazet2023-08-171-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | One missing check in virtio_net_hdr_to_skb() allowed syzbot to crash kernels again [1] Do not allow gso_size to be set to GSO_BY_FRAGS (0xffff), because this magic value is used by the kernel. [1] general protection fault, probably for non-canonical address 0xdffffc000000000e: 0000 [#1] PREEMPT SMP KASAN KASAN: null-ptr-deref in range [0x0000000000000070-0x0000000000000077] CPU: 0 PID: 5039 Comm: syz-executor401 Not tainted 6.5.0-rc5-next-20230809-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 07/26/2023 RIP: 0010:skb_segment+0x1a52/0x3ef0 net/core/skbuff.c:4500 Code: 00 00 00 e9 ab eb ff ff e8 6b 96 5d f9 48 8b 84 24 00 01 00 00 48 8d 78 70 48 b8 00 00 00 00 00 fc ff df 48 89 fa 48 c1 ea 03 <0f> b6 04 02 84 c0 74 08 3c 03 0f 8e ea 21 00 00 48 8b 84 24 00 01 RSP: 0018:ffffc90003d3f1c8 EFLAGS: 00010202 RAX: dffffc0000000000 RBX: 000000000001fffe RCX: 0000000000000000 RDX: 000000000000000e RSI: ffffffff882a3115 RDI: 0000000000000070 RBP: ffffc90003d3f378 R08: 0000000000000005 R09: 000000000000ffff R10: 000000000000ffff R11: 5ee4a93e456187d6 R12: 000000000001ffc6 R13: dffffc0000000000 R14: 0000000000000008 R15: 000000000000ffff FS: 00005555563f2380(0000) GS:ffff8880b9800000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000020020000 CR3: 000000001626d000 CR4: 00000000003506f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <TASK> udp6_ufo_fragment+0x9d2/0xd50 net/ipv6/udp_offload.c:109 ipv6_gso_segment+0x5c4/0x17b0 net/ipv6/ip6_offload.c:120 skb_mac_gso_segment+0x292/0x610 net/core/gso.c:53 __skb_gso_segment+0x339/0x710 net/core/gso.c:124 skb_gso_segment include/net/gso.h:83 [inline] validate_xmit_skb+0x3a5/0xf10 net/core/dev.c:3625 __dev_queue_xmit+0x8f0/0x3d60 net/core/dev.c:4329 dev_queue_xmit include/linux/netdevice.h:3082 [inline] packet_xmit+0x257/0x380 net/packet/af_packet.c:276 packet_snd net/packet/af_packet.c:3087 [inline] packet_sendmsg+0x24c7/0x5570 net/packet/af_packet.c:3119 sock_sendmsg_nosec net/socket.c:727 [inline] sock_sendmsg+0xd9/0x180 net/socket.c:750 ____sys_sendmsg+0x6ac/0x940 net/socket.c:2496 ___sys_sendmsg+0x135/0x1d0 net/socket.c:2550 __sys_sendmsg+0x117/0x1e0 net/socket.c:2579 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x38/0xb0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x63/0xcd RIP: 0033:0x7ff27cdb34d9 Fixes: 3953c46c3ac7 ("sk_buff: allow segmenting based on frag sizes") Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Xin Long <lucien.xin@gmail.com> Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Jason Wang <jasowang@redhat.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Reviewed-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Link: https://lore.kernel.org/r/20230816142158.1779798-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
| * | | sock: Fix misuse of sk_under_memory_pressure()Abel Wu2023-08-172-1/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The status of global socket memory pressure is updated when: a) __sk_mem_raise_allocated(): enter: sk_memory_allocated(sk) > sysctl_mem[1] leave: sk_memory_allocated(sk) <= sysctl_mem[0] b) __sk_mem_reduce_allocated(): leave: sk_under_memory_pressure(sk) && sk_memory_allocated(sk) < sysctl_mem[0] So the conditions of leaving global pressure are inconstant, which may lead to the situation that one pressured net-memcg prevents the global pressure from being cleared when there is indeed no global pressure, thus the global constrains are still in effect unexpectedly on the other sockets. This patch fixes this by ignoring the net-memcg's pressure when deciding whether should leave global memory pressure. Fixes: e1aab161e013 ("socket: initial cgroup code.") Signed-off-by: Abel Wu <wuyun.abel@bytedance.com> Acked-by: Shakeel Butt <shakeelb@google.com> Link: https://lore.kernel.org/r/20230816091226.1542-1-wuyun.abel@bytedance.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
| * | | sfc: don't fail probe if MAE/TC setup failsEdward Cree2023-08-171-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Existing comment in the source explains why we don't want efx_init_tc() failure to be fatal. Cited commit erroneously consolidated failure paths causing the probe to be failed in this case. Fixes: 7e056e2360d9 ("sfc: obtain device mac address based on firmware handle for ef100") Reviewed-by: Martin Habets <habetsm.xilinx@gmail.com> Signed-off-by: Edward Cree <ecree.xilinx@gmail.com> Link: https://lore.kernel.org/r/aa7f589dd6028bd1ad49f0a85f37ab33c09b2b45.1692114888.git.ecree.xilinx@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
| * | | sfc: don't unregister flow_indr if it was never registeredEdward Cree2023-08-171-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In efx_init_tc(), move the setting of efx->tc->up after the flow_indr_dev_register() call, so that if it fails, efx_fini_tc() won't call flow_indr_dev_unregister(). Fixes: 5b2e12d51bd8 ("sfc: bind indirect blocks for TC offload on EF100") Suggested-by: Pieter Jansen van Vuuren <pieter.jansen-van-vuuren@amd.com> Reviewed-by: Martin Habets <habetsm.xilinx@gmail.com> Signed-off-by: Edward Cree <ecree.xilinx@gmail.com> Link: https://lore.kernel.org/r/a81284d7013aba74005277bd81104e4cfbea3f6f.1692114888.git.ecree.xilinx@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
| * | | net: dsa: mv88e6xxx: Wait for EEPROM done before HW resetAlfred Lee2023-08-161-0/+8
| |/ / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If the switch is reset during active EEPROM transactions, as in just after an SoC reset after power up, the I2C bus transaction may be cut short leaving the EEPROM internal I2C state machine in the wrong state. When the switch is reset again, the bad state machine state may result in data being read from the wrong memory location causing the switch to enter unexpected mode rendering it inoperational. Fixes: a3dcb3e7e70c ("net: dsa: mv88e6xxx: Wait for EEPROM done after HW reset") Signed-off-by: Alfred Lee <l00g33k@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://lore.kernel.org/r/20230815001323.24739-1-l00g33k@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
| * | Merge tag 'nf-23-08-16' of ↵David S. Miller2023-08-167-31/+69
| |\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | https://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf Florisn Westphal says: ==================== These are netfilter fixes for the *net* tree. First patch resolves a false-positive lockdep splat: rcu_dereference is used outside of rcu read lock. Let lockdep validate that the transaction mutex is locked. Second patch fixes a kdoc warning added in previous PR. Third patch fixes a memory leak: The catchall element isn't disabled correctly, this allows userspace to deactivate the element again. This results in refcount underflow which in turn prevents memory release. This was always broken since the feature was added in 5.13. Patch 4 fixes an incorrect change in the previous pull request: Adding a duplicate key to a set should work if the duplicate key has expired, restore this behaviour. All from myself. Patch #5 resolves an old historic artifact in sctp conntrack: a 300ms timeout for shutdown_ack. Increase this to 3s. From Xin Long. Patch #6 fixes a sysctl data race in ipvs, two threads can clobber the sysctl value, from Sishuai Gong. This is a day-0 bug that predates git history. Patches 7, 8 and 9, from Pablo Neira Ayuso, are also followups for the previous GC rework in nf_tables: The netlink notifier and the netns exit path must both increment the gc worker seqcount, else worker may encounter stale (free'd) pointers. ================ Signed-off-by: David S. Miller <davem@davemloft.net>
| | * | netfilter: nft_dynset: disallow object mapsPablo Neira Ayuso2023-08-161-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Do not allow to insert elements from datapath to objects maps. Fixes: 8aeff920dcc9 ("netfilter: nf_tables: add stateful object reference to set elements") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Florian Westphal <fw@strlen.de>
| | * | netfilter: nf_tables: GC transaction race with netns dismantlePablo Neira Ayuso2023-08-161-1/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Use maybe_get_net() since GC workqueue might race with netns exit path. Fixes: 5f68718b34a5 ("netfilter: nf_tables: GC transaction API to avoid race with control plane") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Florian Westphal <fw@strlen.de>
| | * | netfilter: nf_tables: fix GC transaction races with netns and netlink event ↵Pablo Neira Ayuso2023-08-161-4/+32
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | exit path Netlink event path is missing a synchronization point with GC transactions. Add GC sequence number update to netns release path and netlink event path, any GC transaction losing race will be discarded. Fixes: 5f68718b34a5 ("netfilter: nf_tables: GC transaction API to avoid race with control plane") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Florian Westphal <fw@strlen.de>
| | * | ipvs: fix racy memcpy in proc_do_sync_thresholdSishuai Gong2023-08-161-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When two threads run proc_do_sync_threshold() in parallel, data races could happen between the two memcpy(): Thread-1 Thread-2 memcpy(val, valp, sizeof(val)); memcpy(valp, val, sizeof(val)); This race might mess up the (struct ctl_table *) table->data, so we add a mutex lock to serialize them. Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Link: https://lore.kernel.org/netdev/B6988E90-0A1E-4B85-BF26-2DAF6D482433@gmail.com/ Signed-off-by: Sishuai Gong <sishuai.system@gmail.com> Acked-by: Simon Horman <horms@kernel.org> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Florian Westphal <fw@strlen.de>
| | * | netfilter: set default timeout to 3 secs for sctp shutdown send and recv stateXin Long2023-08-162-5/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In SCTP protocol, it is using the same timer (T2 timer) for SHUTDOWN and SHUTDOWN_ACK retransmission. However in sctp conntrack the default timeout value for SCTP_CONNTRACK_SHUTDOWN_ACK_SENT state is 3 secs while it's 300 msecs for SCTP_CONNTRACK_SHUTDOWN_SEND/RECV state. As Paolo Valerio noticed, this might cause unwanted expiration of the ct entry. In my test, with 1s tc netem delay set on the NAT path, after the SHUTDOWN is sent, the sctp ct entry enters SCTP_CONNTRACK_SHUTDOWN_SEND state. However, due to 300ms (too short) delay, when the SHUTDOWN_ACK is sent back from the peer, the sctp ct entry has expired and been deleted, and then the SHUTDOWN_ACK has to be dropped. Also, it is confusing these two sysctl options always show 0 due to all timeout values using sec as unit: net.netfilter.nf_conntrack_sctp_timeout_shutdown_recd = 0 net.netfilter.nf_conntrack_sctp_timeout_shutdown_sent = 0 This patch fixes it by also using 3 secs for sctp shutdown send and recv state in sctp conntrack, which is also RTO.initial value in SCTP protocol. Note that the very short time value for SCTP_CONNTRACK_SHUTDOWN_SEND/RECV was probably used for a rare scenario where SHUTDOWN is sent on 1st path but SHUTDOWN_ACK is replied on 2nd path, then a new connection started immediately on 1st path. So this patch also moves from SHUTDOWN_SEND/RECV to CLOSE when receiving INIT in the ORIGINAL direction. Fixes: 9fb9cbb1082d ("[NETFILTER]: Add nf_conntrack subsystem.") Reported-by: Paolo Valerio <pvalerio@redhat.com> Signed-off-by: Xin Long <lucien.xin@gmail.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Florian Westphal <fw@strlen.de>
| | * | netfilter: nf_tables: don't fail inserts if duplicate has expiredFlorian Westphal2023-08-161-19/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | nftables selftests fail: run-tests.sh testcases/sets/0044interval_overlap_0 Expected: 0-2 . 0-3, got: W: [FAILED] ./testcases/sets/0044interval_overlap_0: got 1 Insertion must ignore duplicate but expired entries. Moreover, there is a strange asymmetry in nft_pipapo_activate: It refetches the current element, whereas the other ->activate callbacks (bitmap, hash, rhash, rbtree) use elem->priv. Same for .remove: other set implementations take elem->priv, nft_pipapo_remove fetches elem->priv, then does a relookup, remove this. I suspect this was the reason for the change that prompted the removal of the expired check in pipapo_get() in the first place, but skipping exired elements there makes no sense to me, this helper is used for normal get requests, insertions (duplicate check) and deactivate callback. In first two cases expired elements must be skipped. For ->deactivate(), this gets called for DELSETELEM, so it seems to me that expired elements should be skipped as well, i.e. delete request should fail with -ENOENT error. Fixes: 24138933b97b ("netfilter: nf_tables: don't skip expired elements during walk") Signed-off-by: Florian Westphal <fw@strlen.de>
| | * | netfilter: nf_tables: deactivate catchall elements in next generationFlorian Westphal2023-08-161-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When flushing, individual set elements are disabled in the next generation via the ->flush callback. Catchall elements are not disabled. This is incorrect and may lead to double-deactivations of catchall elements which then results in memory leaks: WARNING: CPU: 1 PID: 3300 at include/net/netfilter/nf_tables.h:1172 nft_map_deactivate+0x549/0x730 CPU: 1 PID: 3300 Comm: nft Not tainted 6.5.0-rc5+ #60 RIP: 0010:nft_map_deactivate+0x549/0x730 [..] ? nft_map_deactivate+0x549/0x730 nf_tables_delset+0xb66/0xeb0 (the warn is due to nft_use_dec() detecting underflow). Fixes: aaa31047a6d2 ("netfilter: nftables: add catch-all set element support") Reported-by: lonial con <kongln9170@gmail.com> Signed-off-by: Florian Westphal <fw@strlen.de>
| | * | netfilter: nf_tables: fix kdoc warnings after gc reworkFlorian Westphal2023-08-162-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Jakub Kicinski says: We've got some new kdoc warnings here: net/netfilter/nft_set_pipapo.c:1557: warning: Function parameter or member '_set' not described in 'pipapo_gc' net/netfilter/nft_set_pipapo.c:1557: warning: Excess function parameter 'set' description in 'pipapo_gc' include/net/netfilter/nf_tables.h:577: warning: Function parameter or member 'dead' not described in 'nft_set' Fixes: 5f68718b34a5 ("netfilter: nf_tables: GC transaction API to avoid race with control plane") Fixes: f6c383b8c31a ("netfilter: nf_tables: adapt set backend to use GC transaction API") Reported-by: Jakub Kicinski <kuba@kernel.org> Closes: https://lore.kernel.org/netdev/20230810104638.746e46f1@kernel.org/ Signed-off-by: Florian Westphal <fw@strlen.de>
| | * | netfilter: nf_tables: fix false-positive lockdep splatFlorian Westphal2023-08-161-1/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | ->abort invocation may cause splat on debug kernels: WARNING: suspicious RCU usage net/netfilter/nft_set_pipapo.c:1697 suspicious rcu_dereference_check() usage! [..] rcu_scheduler_active = 2, debug_locks = 1 1 lock held by nft/133554: [..] (nft_net->commit_mutex){+.+.}-{3:3}, at: nf_tables_valid_genid [..] lockdep_rcu_suspicious+0x1ad/0x260 nft_pipapo_abort+0x145/0x180 __nf_tables_abort+0x5359/0x63d0 nf_tables_abort+0x24/0x40 nfnetlink_rcv+0x1a0a/0x22c0 netlink_unicast+0x73c/0x900 netlink_sendmsg+0x7f0/0xc20 ____sys_sendmsg+0x48d/0x760 Transaction mutex is held, so parallel updates are not possible. Switch to _protected and check mutex is held for lockdep enabled builds. Fixes: 212ed75dc5fb ("netfilter: nf_tables: integrate pipapo into commit protocol") Signed-off-by: Florian Westphal <fw@strlen.de>
| * | | ipv6: fix indentation of a config attributePrasad Pandit2023-08-161-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Fix indentation of a type attribute of IPV6_VTI config entry. Signed-off-by: Prasad Pandit <pjp@fedoraproject.org> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | mailmap: add entries for Simon HormanSimon Horman2023-08-161-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Retire some of my email addresses from Kernel activities. Signed-off-by: Simon Horman <horms@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | Merge tag 'ipsec-2023-08-15' of ↵David S. Miller2023-08-169-30/+34
| |\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec Steffen Klassert says: ==================== 1) Fix a slab-out-of-bounds read in xfrm_address_filter. From Lin Ma. 2) Fix the pfkey sadb_x_filter validation. From Lin Ma. 3) Use the correct nla_policy structure for XFRMA_SEC_CTX. From Lin Ma. 4) Fix warnings triggerable by bad packets in the encap functions. From Herbert Xu. 5) Fix some slab-use-after-free in decode_session6. From Zhengchao Shao. 6) Fix a possible NULL piointer dereference in xfrm_update_ae_params. Lin Ma. 7) Add a forgotten nla_policy for XFRMA_MTIMER_THRESH. From Lin Ma. 8) Don't leak offloaded policies. From Leon Romanovsky. 9) Delete also the offloading part of an acquire state. From Leon Romanovsky. Please pull or let me know if there are problems.
| | * | | xfrm: don't skip free of empty state in acquire policyLeon Romanovsky2023-08-012-6/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In destruction flow, the assignment of NULL to xso->dev caused to skip of xfrm_dev_state_free() call, which was called in xfrm_state_put(to_put) routine. Instead of open-coded variant of xfrm_dev_state_delete() and xfrm_dev_state_free(), let's use them directly. Fixes: f8a70afafc17 ("xfrm: add TX datapath support for IPsec packet offload mode") Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
| | * | | xfrm: delete offloaded policyLeon Romanovsky2023-08-011-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The policy memory was released but not HW driver data. Add call to xfrm_dev_policy_delete(), so drivers will have a chance to release their resources. Fixes: 919e43fad516 ("xfrm: add an interface to offload policy") Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
| | * | | xfrm: add forgotten nla_policy for XFRMA_MTIMER_THRESHLin Ma2023-07-311-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The previous commit 4e484b3e969b ("xfrm: rate limit SA mapping change message to user space") added one additional attribute named XFRMA_MTIMER_THRESH and described its type at compat_policy (net/xfrm/xfrm_compat.c). However, the author forgot to also describe the nla_policy at xfrma_policy (net/xfrm/xfrm_user.c). Hence, this suppose NLA_U32 (4 bytes) value can be faked as empty (0 bytes) by a malicious user, which leads to 4 bytes overflow read and heap information leak when parsing nlattrs. To exploit this, one malicious user can spray the SLUB objects and then leverage this 4 bytes OOB read to leak the heap data into x->mapping_maxage (see xfrm_update_ae_params(...)), and leak it to userspace via copy_to_user_state_extra(...). The above bug is assigned CVE-2023-3773. To fix it, this commit just completes the nla_policy description for XFRMA_MTIMER_THRESH, which enforces the length check and avoids such OOB read. Fixes: 4e484b3e969b ("xfrm: rate limit SA mapping change message to user space") Signed-off-by: Lin Ma <linma@zju.edu.cn> Reviewed-by: Simon Horman <simon.horman@corigine.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
| | * | | xfrm: add NULL check in xfrm_update_ae_paramsLin Ma2023-07-311-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Normally, x->replay_esn and x->preplay_esn should be allocated at xfrm_alloc_replay_state_esn(...) in xfrm_state_construct(...), hence the xfrm_update_ae_params(...) is okay to update them. However, the current implementation of xfrm_new_ae(...) allows a malicious user to directly dereference a NULL pointer and crash the kernel like below. BUG: kernel NULL pointer dereference, address: 0000000000000000 PGD 8253067 P4D 8253067 PUD 8e0e067 PMD 0 Oops: 0002 [#1] PREEMPT SMP KASAN NOPTI CPU: 0 PID: 98 Comm: poc.npd Not tainted 6.4.0-rc7-00072-gdad9774deaf1 #8 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.o4 RIP: 0010:memcpy_orig+0xad/0x140 Code: e8 4c 89 5f e0 48 8d 7f e0 73 d2 83 c2 20 48 29 d6 48 29 d7 83 fa 10 72 34 4c 8b 06 4c 8b 4e 08 c RSP: 0018:ffff888008f57658 EFLAGS: 00000202 RAX: 0000000000000000 RBX: ffff888008bd0000 RCX: ffffffff8238e571 RDX: 0000000000000018 RSI: ffff888007f64844 RDI: 0000000000000000 RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000000 R12: ffff888008f57818 R13: ffff888007f64aa4 R14: 0000000000000000 R15: 0000000000000000 FS: 00000000014013c0(0000) GS:ffff88806d600000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000000 CR3: 00000000054d8000 CR4: 00000000000006f0 Call Trace: <TASK> ? __die+0x1f/0x70 ? page_fault_oops+0x1e8/0x500 ? __pfx_is_prefetch.constprop.0+0x10/0x10 ? __pfx_page_fault_oops+0x10/0x10 ? _raw_spin_unlock_irqrestore+0x11/0x40 ? fixup_exception+0x36/0x460 ? _raw_spin_unlock_irqrestore+0x11/0x40 ? exc_page_fault+0x5e/0xc0 ? asm_exc_page_fault+0x26/0x30 ? xfrm_update_ae_params+0xd1/0x260 ? memcpy_orig+0xad/0x140 ? __pfx__raw_spin_lock_bh+0x10/0x10 xfrm_update_ae_params+0xe7/0x260 xfrm_new_ae+0x298/0x4e0 ? __pfx_xfrm_new_ae+0x10/0x10 ? __pfx_xfrm_new_ae+0x10/0x10 xfrm_user_rcv_msg+0x25a/0x410 ? __pfx_xfrm_user_rcv_msg+0x10/0x10 ? __alloc_skb+0xcf/0x210 ? stack_trace_save+0x90/0xd0 ? filter_irq_stacks+0x1c/0x70 ? __stack_depot_save+0x39/0x4e0 ? __kasan_slab_free+0x10a/0x190 ? kmem_cache_free+0x9c/0x340 ? netlink_recvmsg+0x23c/0x660 ? sock_recvmsg+0xeb/0xf0 ? __sys_recvfrom+0x13c/0x1f0 ? __x64_sys_recvfrom+0x71/0x90 ? do_syscall_64+0x3f/0x90 ? entry_SYSCALL_64_after_hwframe+0x72/0xdc ? copyout+0x3e/0x50 netlink_rcv_skb+0xd6/0x210 ? __pfx_xfrm_user_rcv_msg+0x10/0x10 ? __pfx_netlink_rcv_skb+0x10/0x10 ? __pfx_sock_has_perm+0x10/0x10 ? mutex_lock+0x8d/0xe0 ? __pfx_mutex_lock+0x10/0x10 xfrm_netlink_rcv+0x44/0x50 netlink_unicast+0x36f/0x4c0 ? __pfx_netlink_unicast+0x10/0x10 ? netlink_recvmsg+0x500/0x660 netlink_sendmsg+0x3b7/0x700 This Null-ptr-deref bug is assigned CVE-2023-3772. And this commit adds additional NULL check in xfrm_update_ae_params to fix the NPD. Fixes: d8647b79c3b7 ("xfrm: Add user interface for esn and big anti-replay windows") Signed-off-by: Lin Ma <linma@zju.edu.cn> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
| | * | | ip_vti: fix potential slab-use-after-free in decode_session6Zhengchao Shao2023-07-111-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When ip_vti device is set to the qdisc of the sfb type, the cb field of the sent skb may be modified during enqueuing. Then, slab-use-after-free may occur when ip_vti device sends IPv6 packets. As commit f855691975bb ("xfrm6: Fix the nexthdr offset in _decode_session6.") showed, xfrm_decode_session was originally intended only for the receive path. IP6CB(skb)->nhoff is not set during transmission. Therefore, set the cb field in the skb to 0 before sending packets. Fixes: f855691975bb ("xfrm6: Fix the nexthdr offset in _decode_session6.") Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
| | * | | ip6_vti: fix slab-use-after-free in decode_session6Zhengchao Shao2023-07-111-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When ipv6_vti device is set to the qdisc of the sfb type, the cb field of the sent skb may be modified during enqueuing. Then, slab-use-after-free may occur when ipv6_vti device sends IPv6 packets. The stack information is as follows: BUG: KASAN: slab-use-after-free in decode_session6+0x103f/0x1890 Read of size 1 at addr ffff88802e08edc2 by task swapper/0/0 CPU: 0 PID: 0 Comm: swapper/0 Not tainted 6.4.0-next-20230707-00001-g84e2cad7f979 #410 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-1.fc33 04/01/2014 Call Trace: <IRQ> dump_stack_lvl+0xd9/0x150 print_address_description.constprop.0+0x2c/0x3c0 kasan_report+0x11d/0x130 decode_session6+0x103f/0x1890 __xfrm_decode_session+0x54/0xb0 vti6_tnl_xmit+0x3e6/0x1ee0 dev_hard_start_xmit+0x187/0x700 sch_direct_xmit+0x1a3/0xc30 __qdisc_run+0x510/0x17a0 __dev_queue_xmit+0x2215/0x3b10 neigh_connected_output+0x3c2/0x550 ip6_finish_output2+0x55a/0x1550 ip6_finish_output+0x6b9/0x1270 ip6_output+0x1f1/0x540 ndisc_send_skb+0xa63/0x1890 ndisc_send_rs+0x132/0x6f0 addrconf_rs_timer+0x3f1/0x870 call_timer_fn+0x1a0/0x580 expire_timers+0x29b/0x4b0 run_timer_softirq+0x326/0x910 __do_softirq+0x1d4/0x905 irq_exit_rcu+0xb7/0x120 sysvec_apic_timer_interrupt+0x97/0xc0 </IRQ> Allocated by task 9176: kasan_save_stack+0x22/0x40 kasan_set_track+0x25/0x30 __kasan_slab_alloc+0x7f/0x90 kmem_cache_alloc_node+0x1cd/0x410 kmalloc_reserve+0x165/0x270 __alloc_skb+0x129/0x330 netlink_sendmsg+0x9b1/0xe30 sock_sendmsg+0xde/0x190 ____sys_sendmsg+0x739/0x920 ___sys_sendmsg+0x110/0x1b0 __sys_sendmsg+0xf7/0x1c0 do_syscall_64+0x39/0xb0 entry_SYSCALL_64_after_hwframe+0x63/0xcd Freed by task 9176: kasan_save_stack+0x22/0x40 kasan_set_track+0x25/0x30 kasan_save_free_info+0x2b/0x40 ____kasan_slab_free+0x160/0x1c0 slab_free_freelist_hook+0x11b/0x220 kmem_cache_free+0xf0/0x490 skb_free_head+0x17f/0x1b0 skb_release_data+0x59c/0x850 consume_skb+0xd2/0x170 netlink_unicast+0x54f/0x7f0 netlink_sendmsg+0x926/0xe30 sock_sendmsg+0xde/0x190 ____sys_sendmsg+0x739/0x920 ___sys_sendmsg+0x110/0x1b0 __sys_sendmsg+0xf7/0x1c0 do_syscall_64+0x39/0xb0 entry_SYSCALL_64_after_hwframe+0x63/0xcd The buggy address belongs to the object at ffff88802e08ed00 which belongs to the cache skbuff_small_head of size 640 The buggy address is located 194 bytes inside of freed 640-byte region [ffff88802e08ed00, ffff88802e08ef80) As commit f855691975bb ("xfrm6: Fix the nexthdr offset in _decode_session6.") showed, xfrm_decode_session was originally intended only for the receive path. IP6CB(skb)->nhoff is not set during transmission. Therefore, set the cb field in the skb to 0 before sending packets. Fixes: f855691975bb ("xfrm6: Fix the nexthdr offset in _decode_session6.") Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
| | * | | xfrm: fix slab-use-after-free in decode_session6Zhengchao Shao2023-07-111-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When the xfrm device is set to the qdisc of the sfb type, the cb field of the sent skb may be modified during enqueuing. Then, slab-use-after-free may occur when the xfrm device sends IPv6 packets. The stack information is as follows: BUG: KASAN: slab-use-after-free in decode_session6+0x103f/0x1890 Read of size 1 at addr ffff8881111458ef by task swapper/3/0 CPU: 3 PID: 0 Comm: swapper/3 Not tainted 6.4.0-next-20230707 #409 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-1.fc33 04/01/2014 Call Trace: <IRQ> dump_stack_lvl+0xd9/0x150 print_address_description.constprop.0+0x2c/0x3c0 kasan_report+0x11d/0x130 decode_session6+0x103f/0x1890 __xfrm_decode_session+0x54/0xb0 xfrmi_xmit+0x173/0x1ca0 dev_hard_start_xmit+0x187/0x700 sch_direct_xmit+0x1a3/0xc30 __qdisc_run+0x510/0x17a0 __dev_queue_xmit+0x2215/0x3b10 neigh_connected_output+0x3c2/0x550 ip6_finish_output2+0x55a/0x1550 ip6_finish_output+0x6b9/0x1270 ip6_output+0x1f1/0x540 ndisc_send_skb+0xa63/0x1890 ndisc_send_rs+0x132/0x6f0 addrconf_rs_timer+0x3f1/0x870 call_timer_fn+0x1a0/0x580 expire_timers+0x29b/0x4b0 run_timer_softirq+0x326/0x910 __do_softirq+0x1d4/0x905 irq_exit_rcu+0xb7/0x120 sysvec_apic_timer_interrupt+0x97/0xc0 </IRQ> <TASK> asm_sysvec_apic_timer_interrupt+0x1a/0x20 RIP: 0010:intel_idle_hlt+0x23/0x30 Code: 1f 84 00 00 00 00 00 f3 0f 1e fa 41 54 41 89 d4 0f 1f 44 00 00 66 90 0f 1f 44 00 00 0f 00 2d c4 9f ab 00 0f 1f 44 00 00 fb f4 <fa> 44 89 e0 41 5c c3 66 0f 1f 44 00 00 f3 0f 1e fa 41 54 41 89 d4 RSP: 0018:ffffc90000197d78 EFLAGS: 00000246 RAX: 00000000000a83c3 RBX: ffffe8ffffd09c50 RCX: ffffffff8a22d8e5 RDX: 0000000000000001 RSI: ffffffff8d3f8080 RDI: ffffe8ffffd09c50 RBP: ffffffff8d3f8080 R08: 0000000000000001 R09: ffffed1026ba6d9d R10: ffff888135d36ceb R11: 0000000000000001 R12: 0000000000000001 R13: ffffffff8d3f8100 R14: 0000000000000001 R15: 0000000000000000 cpuidle_enter_state+0xd3/0x6f0 cpuidle_enter+0x4e/0xa0 do_idle+0x2fe/0x3c0 cpu_startup_entry+0x18/0x20 start_secondary+0x200/0x290 secondary_startup_64_no_verify+0x167/0x16b </TASK> Allocated by task 939: kasan_save_stack+0x22/0x40 kasan_set_track+0x25/0x30 __kasan_slab_alloc+0x7f/0x90 kmem_cache_alloc_node+0x1cd/0x410 kmalloc_reserve+0x165/0x270 __alloc_skb+0x129/0x330 inet6_ifa_notify+0x118/0x230 __ipv6_ifa_notify+0x177/0xbe0 addrconf_dad_completed+0x133/0xe00 addrconf_dad_work+0x764/0x1390 process_one_work+0xa32/0x16f0 worker_thread+0x67d/0x10c0 kthread+0x344/0x440 ret_from_fork+0x1f/0x30 The buggy address belongs to the object at ffff888111145800 which belongs to the cache skbuff_small_head of size 640 The buggy address is located 239 bytes inside of freed 640-byte region [ffff888111145800, ffff888111145a80) As commit f855691975bb ("xfrm6: Fix the nexthdr offset in _decode_session6.") showed, xfrm_decode_session was originally intended only for the receive path. IP6CB(skb)->nhoff is not set during transmission. Therefore, set the cb field in the skb to 0 before sending packets. Fixes: f855691975bb ("xfrm6: Fix the nexthdr offset in _decode_session6.") Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
| | * | | xfrm: Silence warnings triggerable by bad packetsHerbert Xu2023-07-101-13/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | After the elimination of inner modes, a couple of warnings that were previously unreachable can now be triggered by malformed inbound packets. Fix this by: 1. Moving the setting of skb->protocol into the decap functions. 2. Returning -EINVAL when unexpected protocol is seen. Reported-by: Maciej Żenczykowski<maze@google.com> Fixes: 5f24f41e8ea6 ("xfrm: Remove inner/outer modes from input path") Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Reviewed-by: Maciej Żenczykowski <maze@google.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
| | * | | net: xfrm: Amend XFRMA_SEC_CTX nla_policy structureLin Ma2023-07-032-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | According to all consumers code of attrs[XFRMA_SEC_CTX], like * verify_sec_ctx_len(), convert to xfrm_user_sec_ctx* * xfrm_state_construct(), call security_xfrm_state_alloc whose prototype is int security_xfrm_state_alloc(.., struct xfrm_user_sec_ctx *sec_ctx); * copy_from_user_sec_ctx(), convert to xfrm_user_sec_ctx * ... It seems that the expected parsing result for XFRMA_SEC_CTX should be structure xfrm_user_sec_ctx, and the current xfrm_sec_ctx is confusing and misleading (Luckily, they happen to have same size 8 bytes). This commit amend the policy structure to xfrm_user_sec_ctx to avoid ambiguity. Fixes: cf5cb79f6946 ("[XFRM] netlink: Establish an attribute policy") Signed-off-by: Lin Ma <linma@zju.edu.cn> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
| | * | | net: af_key: fix sadb_x_filter validationLin Ma2023-06-291-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When running xfrm_state_walk_init(), the xfrm_address_filter being used is okay to have a splen/dplen that equals to sizeof(xfrm_address_t)<<3. This commit replaces >= to > to make sure the boundary checking is correct. Fixes: 37bd22420f85 ("af_key: pfkey_dump needs parameter validation") Signed-off-by: Lin Ma <linma@zju.edu.cn> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
| | * | | net: xfrm: Fix xfrm_address_filter OOB readLin Ma2023-06-291-0/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We found below OOB crash: [ 44.211730] ================================================================== [ 44.212045] BUG: KASAN: slab-out-of-bounds in memcmp+0x8b/0xb0 [ 44.212045] Read of size 8 at addr ffff88800870f320 by task poc.xfrm/97 [ 44.212045] [ 44.212045] CPU: 0 PID: 97 Comm: poc.xfrm Not tainted 6.4.0-rc7-00072-gdad9774deaf1-dirty #4 [ 44.212045] Call Trace: [ 44.212045] <TASK> [ 44.212045] dump_stack_lvl+0x37/0x50 [ 44.212045] print_report+0xcc/0x620 [ 44.212045] ? __virt_addr_valid+0xf3/0x170 [ 44.212045] ? memcmp+0x8b/0xb0 [ 44.212045] kasan_report+0xb2/0xe0 [ 44.212045] ? memcmp+0x8b/0xb0 [ 44.212045] kasan_check_range+0x39/0x1c0 [ 44.212045] memcmp+0x8b/0xb0 [ 44.212045] xfrm_state_walk+0x21c/0x420 [ 44.212045] ? __pfx_dump_one_state+0x10/0x10 [ 44.212045] xfrm_dump_sa+0x1e2/0x290 [ 44.212045] ? __pfx_xfrm_dump_sa+0x10/0x10 [ 44.212045] ? __kernel_text_address+0xd/0x40 [ 44.212045] ? kasan_unpoison+0x27/0x60 [ 44.212045] ? mutex_lock+0x60/0xe0 [ 44.212045] ? __pfx_mutex_lock+0x10/0x10 [ 44.212045] ? kasan_save_stack+0x22/0x50 [ 44.212045] netlink_dump+0x322/0x6c0 [ 44.212045] ? __pfx_netlink_dump+0x10/0x10 [ 44.212045] ? mutex_unlock+0x7f/0xd0 [ 44.212045] ? __pfx_mutex_unlock+0x10/0x10 [ 44.212045] __netlink_dump_start+0x353/0x430 [ 44.212045] xfrm_user_rcv_msg+0x3a4/0x410 [ 44.212045] ? __pfx__raw_spin_lock_irqsave+0x10/0x10 [ 44.212045] ? __pfx_xfrm_user_rcv_msg+0x10/0x10 [ 44.212045] ? __pfx_xfrm_dump_sa+0x10/0x10 [ 44.212045] ? __pfx_xfrm_dump_sa_done+0x10/0x10 [ 44.212045] ? __stack_depot_save+0x382/0x4e0 [ 44.212045] ? filter_irq_stacks+0x1c/0x70 [ 44.212045] ? kasan_save_stack+0x32/0x50 [ 44.212045] ? kasan_save_stack+0x22/0x50 [ 44.212045] ? kasan_set_track+0x25/0x30 [ 44.212045] ? __kasan_slab_alloc+0x59/0x70 [ 44.212045] ? kmem_cache_alloc_node+0xf7/0x260 [ 44.212045] ? kmalloc_reserve+0xab/0x120 [ 44.212045] ? __alloc_skb+0xcf/0x210 [ 44.212045] ? netlink_sendmsg+0x509/0x700 [ 44.212045] ? sock_sendmsg+0xde/0xe0 [ 44.212045] ? __sys_sendto+0x18d/0x230 [ 44.212045] ? __x64_sys_sendto+0x71/0x90 [ 44.212045] ? do_syscall_64+0x3f/0x90 [ 44.212045] ? entry_SYSCALL_64_after_hwframe+0x72/0xdc [ 44.212045] ? netlink_sendmsg+0x509/0x700 [ 44.212045] ? sock_sendmsg+0xde/0xe0 [ 44.212045] ? __sys_sendto+0x18d/0x230 [ 44.212045] ? __x64_sys_sendto+0x71/0x90 [ 44.212045] ? do_syscall_64+0x3f/0x90 [ 44.212045] ? entry_SYSCALL_64_after_hwframe+0x72/0xdc [ 44.212045] ? kasan_save_stack+0x22/0x50 [ 44.212045] ? kasan_set_track+0x25/0x30 [ 44.212045] ? kasan_save_free_info+0x2e/0x50 [ 44.212045] ? __kasan_slab_free+0x10a/0x190 [ 44.212045] ? kmem_cache_free+0x9c/0x340 [ 44.212045] ? netlink_recvmsg+0x23c/0x660 [ 44.212045] ? sock_recvmsg+0xeb/0xf0 [ 44.212045] ? __sys_recvfrom+0x13c/0x1f0 [ 44.212045] ? __x64_sys_recvfrom+0x71/0x90 [ 44.212045] ? do_syscall_64+0x3f/0x90 [ 44.212045] ? entry_SYSCALL_64_after_hwframe+0x72/0xdc [ 44.212045] ? copyout+0x3e/0x50 [ 44.212045] netlink_rcv_skb+0xd6/0x210 [ 44.212045] ? __pfx_xfrm_user_rcv_msg+0x10/0x10 [ 44.212045] ? __pfx_netlink_rcv_skb+0x10/0x10 [ 44.212045] ? __pfx_sock_has_perm+0x10/0x10 [ 44.212045] ? mutex_lock+0x8d/0xe0 [ 44.212045] ? __pfx_mutex_lock+0x10/0x10 [ 44.212045] xfrm_netlink_rcv+0x44/0x50 [ 44.212045] netlink_unicast+0x36f/0x4c0 [ 44.212045] ? __pfx_netlink_unicast+0x10/0x10 [ 44.212045] ? netlink_recvmsg+0x500/0x660 [ 44.212045] netlink_sendmsg+0x3b7/0x700 [ 44.212045] ? __pfx_netlink_sendmsg+0x10/0x10 [ 44.212045] ? __pfx_netlink_sendmsg+0x10/0x10 [ 44.212045] sock_sendmsg+0xde/0xe0 [ 44.212045] __sys_sendto+0x18d/0x230 [ 44.212045] ? __pfx___sys_sendto+0x10/0x10 [ 44.212045] ? rcu_core+0x44a/0xe10 [ 44.212045] ? __rseq_handle_notify_resume+0x45b/0x740 [ 44.212045] ? _raw_spin_lock_irq+0x81/0xe0 [ 44.212045] ? __pfx___rseq_handle_notify_resume+0x10/0x10 [ 44.212045] ? __pfx_restore_fpregs_from_fpstate+0x10/0x10 [ 44.212045] ? __pfx_blkcg_maybe_throttle_current+0x10/0x10 [ 44.212045] ? __pfx_task_work_run+0x10/0x10 [ 44.212045] __x64_sys_sendto+0x71/0x90 [ 44.212045] do_syscall_64+0x3f/0x90 [ 44.212045] entry_SYSCALL_64_after_hwframe+0x72/0xdc [ 44.212045] RIP: 0033:0x44b7da [ 44.212045] RSP: 002b:00007ffdc8838548 EFLAGS: 00000246 ORIG_RAX: 000000000000002c [ 44.212045] RAX: ffffffffffffffda RBX: 00007ffdc8839978 RCX: 000000000044b7da [ 44.212045] RDX: 0000000000000038 RSI: 00007ffdc8838770 RDI: 0000000000000003 [ 44.212045] RBP: 00007ffdc88385b0 R08: 00007ffdc883858c R09: 000000000000000c [ 44.212045] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000001 [ 44.212045] R13: 00007ffdc8839968 R14: 00000000004c37d0 R15: 0000000000000001 [ 44.212045] </TASK> [ 44.212045] [ 44.212045] Allocated by task 97: [ 44.212045] kasan_save_stack+0x22/0x50 [ 44.212045] kasan_set_track+0x25/0x30 [ 44.212045] __kasan_kmalloc+0x7f/0x90 [ 44.212045] __kmalloc_node_track_caller+0x5b/0x140 [ 44.212045] kmemdup+0x21/0x50 [ 44.212045] xfrm_dump_sa+0x17d/0x290 [ 44.212045] netlink_dump+0x322/0x6c0 [ 44.212045] __netlink_dump_start+0x353/0x430 [ 44.212045] xfrm_user_rcv_msg+0x3a4/0x410 [ 44.212045] netlink_rcv_skb+0xd6/0x210 [ 44.212045] xfrm_netlink_rcv+0x44/0x50 [ 44.212045] netlink_unicast+0x36f/0x4c0 [ 44.212045] netlink_sendmsg+0x3b7/0x700 [ 44.212045] sock_sendmsg+0xde/0xe0 [ 44.212045] __sys_sendto+0x18d/0x230 [ 44.212045] __x64_sys_sendto+0x71/0x90 [ 44.212045] do_syscall_64+0x3f/0x90 [ 44.212045] entry_SYSCALL_64_after_hwframe+0x72/0xdc [ 44.212045] [ 44.212045] The buggy address belongs to the object at ffff88800870f300 [ 44.212045] which belongs to the cache kmalloc-64 of size 64 [ 44.212045] The buggy address is located 32 bytes inside of [ 44.212045] allocated 36-byte region [ffff88800870f300, ffff88800870f324) [ 44.212045] [ 44.212045] The buggy address belongs to the physical page: [ 44.212045] page:00000000e4de16ee refcount:1 mapcount:0 mapping:000000000 ... [ 44.212045] flags: 0x100000000000200(slab|node=0|zone=1) [ 44.212045] page_type: 0xffffffff() [ 44.212045] raw: 0100000000000200 ffff888004c41640 dead000000000122 0000000000000000 [ 44.212045] raw: 0000000000000000 0000000080200020 00000001ffffffff 0000000000000000 [ 44.212045] page dumped because: kasan: bad access detected [ 44.212045] [ 44.212045] Memory state around the buggy address: [ 44.212045] ffff88800870f200: fa fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc [ 44.212045] ffff88800870f280: 00 00 00 00 00 fc fc fc fc fc fc fc fc fc fc fc [ 44.212045] >ffff88800870f300: 00 00 00 00 04 fc fc fc fc fc fc fc fc fc fc fc [ 44.212045] ^ [ 44.212045] ffff88800870f380: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc [ 44.212045] ffff88800870f400: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc [ 44.212045] ================================================================== By investigating the code, we find the root cause of this OOB is the lack of checks in xfrm_dump_sa(). The buggy code allows a malicious user to pass arbitrary value of filter->splen/dplen. Hence, with crafted xfrm states, the attacker can achieve 8 bytes heap OOB read, which causes info leak. if (attrs[XFRMA_ADDRESS_FILTER]) { filter = kmemdup(nla_data(attrs[XFRMA_ADDRESS_FILTER]), sizeof(*filter), GFP_KERNEL); if (filter == NULL) return -ENOMEM; // NO MORE CHECKS HERE !!! } This patch fixes the OOB by adding necessary boundary checks, just like the code in pfkey_dump() function. Fixes: d3623099d350 ("ipsec: add support of limited SA dump") Signed-off-by: Lin Ma <linma@zju.edu.cn> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
| * | | | broadcom: b44: Use b44_writephy() return valueArtem Chernyshev2023-08-161-5/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Return result of b44_writephy() instead of zero to deal with possible error. Found by Linux Verification Center (linuxtesting.org) with SVACE. Signed-off-by: Artem Chernyshev <artem.chernyshev@red-soft.ru> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | | net: openvswitch: reject negative ifindexJakub Kicinski2023-08-151-4/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Recent changes in net-next (commit 759ab1edb56c ("net: store netdevs in an xarray")) refactored the handling of pre-assigned ifindexes and let syzbot surface a latent problem in ovs. ovs does not validate ifindex, making it possible to create netdev ports with negative ifindex values. It's easy to repro with YNL: $ ./cli.py --spec netlink/specs/ovs_datapath.yaml \ --do new \ --json '{"upcall-pid": 1, "name":"my-dp"}' $ ./cli.py --spec netlink/specs/ovs_vport.yaml \ --do new \ --json '{"upcall-pid": "00000001", "name": "some-port0", "dp-ifindex":3,"ifindex":4294901760,"type":2}' $ ip link show -65536: some-port0: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000 link/ether 7a:48:21:ad:0b:fb brd ff:ff:ff:ff:ff:ff ... Validate the inputs. Now the second command correctly returns: $ ./cli.py --spec netlink/specs/ovs_vport.yaml \ --do new \ --json '{"upcall-pid": "00000001", "name": "some-port0", "dp-ifindex":3,"ifindex":4294901760,"type":2}' lib.ynl.NlError: Netlink error: Numerical result out of range nl_len = 108 (92) nl_flags = 0x300 nl_type = 2 error: -34 extack: {'msg': 'integer out of range', 'unknown': [[type:4 len:36] b'\x0c\x00\x02\x00\x00\x00\x00\x00\x00\x00\x00\x00\x0c\x00\x03\x00\xff\xff\xff\x7f\x00\x00\x00\x00\x08\x00\x01\x00\x08\x00\x00\x00'], 'bad-attr': '.ifindex'} Accept 0 since it used to be silently ignored. Fixes: 54c4ef34c4b6 ("openvswitch: allow specifying ifindex of new interfaces") Reported-by: syzbot+7456b5dcf65111553320@syzkaller.appspotmail.com Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Reviewed-by: Aaron Conole <aconole@redhat.com> Link: https://lore.kernel.org/r/20230814203840.2908710-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>