summaryrefslogtreecommitdiffstats
path: root/drivers
Commit message (Collapse)AuthorAgeFilesLines
* Merge tag 'platform-drivers-x86-v6.7-4' of ↵Linus Torvalds2023-12-143-14/+41
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86 Pull x86 platform driver fixes from Ilpo Järvinen: - tablet-mode-switch events fix - kernel-doc warning fixes * tag 'platform-drivers-x86-v6.7-4' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86: platform/x86: intel_ips: fix kernel-doc formatting platform/x86: thinkpad_acpi: fix kernel-doc warnings platform/x86: intel-vbtn: Fix missing tablet-mode-switch events
| * platform/x86: intel_ips: fix kernel-doc formattingRandy Dunlap2023-12-081-7/+23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Fix kernel-doc function notation and comment formatting to prevent warnings from scripts/kernel-doc. for drivers/platform/x86/intel_ips.c: 595: warning: No description found for return value of 'mcp_exceeded' 624: warning: No description found for return value of 'cpu_exceeded' 650: warning: No description found for return value of 'mch_exceeded' 745: warning: bad line: cpu+ gpu+ cpu+gpu- cpu-gpu+ cpu-gpu- 746: warning: bad line: cpu < gpu < cpu+gpu+ cpu+ gpu+ nothing 753: warning: No description found for return value of 'ips_adjust' 747: warning: bad line: cpu < gpu >= cpu+gpu-(mcp<) cpu+gpu-(mcp<) gpu- gpu- 748: warning: bad line: cpu >= gpu < cpu-gpu+(mcp<) cpu- cpu-gpu+(mcp<) cpu- 749: warning: bad line: cpu >= gpu >= cpu-gpu- cpu-gpu- cpu-gpu- cpu-gpu- 945: warning: No description found for return value of 'ips_monitor' 1151: warning: No description found for return value of 'ips_irq_handler' 1301: warning: Function parameter or member 'ips' not described in 'ips_detect_cpu' 1302: warning: No description found for return value of 'ips_detect_cpu' 1358: warning: No description found for return value of 'ips_get_i915_syms' Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Cc: Hans de Goede <hdegoede@redhat.com> Cc: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Cc: platform-driver-x86@vger.kernel.org Link: https://lore.kernel.org/r/20231206060120.4816-1-rdunlap@infradead.org Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
| * platform/x86: thinkpad_acpi: fix kernel-doc warningsRandy Dunlap2023-12-081-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add a function's return description and don't misuse "/**" for non-kernel-doc comments to prevent warnings from scripts/kernel-doc. thinkpad_acpi.c:523: warning: No description found for return value of 'tpacpi_check_quirks' thinkpad_acpi.c:9307: warning: This comment starts with '/**', but isn't a kernel-doc comment. Refer Documentation/doc-guide/kernel-doc.rst thinkpad_acpi.c:9307: warning: missing initial short description on line: * This evaluates a ACPI method call specific to the battery Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Cc: Henrique de Moraes Holschuh <hmh@hmh.eng.br> Cc: Hans de Goede <hdegoede@redhat.com> Cc: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> CC: ibm-acpi-devel@lists.sourceforge.net CC: platform-driver-x86@vger.kernel.org Reviewed-by: mpearson-lenovo@squebb.ca Link: https://lore.kernel.org/r/20231206060144.8260-1-rdunlap@infradead.org Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
| * platform/x86: intel-vbtn: Fix missing tablet-mode-switch eventsHans de Goede2023-12-081-4/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | 2 issues have been reported on the Dell Inspiron 7352: 1. Sometimes the tablet-mode-switch stops reporting tablet-mode change events. Add a "VBDL" call to notify_handler() to work around this. 2. Sometimes the tablet-mode is incorrect after suspend/resume Add a detect_tablet_mode() to resume() to fix this. Reported-by: Arnold Gozum <arngozum@gmail.com> Closes: https://lore.kernel.org/platform-driver-x86/87271a74-c831-4eec-b7a4-1371d0e42471@gmail.com/ Tested-by: Arnold Gozum <arngozum@gmail.com> Signed-off-by: Hans de Goede <hdegoede@redhat.com> Reviewed-by: Andy Shevchenko <andy@kernel.org> Link: https://lore.kernel.org/r/20231204150601.46976-1-hdegoede@redhat.com Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
* | Merge tag 'net-6.7-rc6' of ↵Linus Torvalds2023-12-1445-546/+1042
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Paolo Abeni: "Current release - regressions: - tcp: fix tcp_disordered_ack() vs usec TS resolution Current release - new code bugs: - dpll: sanitize possible null pointer dereference in dpll_pin_parent_pin_set() - eth: octeon_ep: initialise control mbox tasks before using APIs Previous releases - regressions: - io_uring/af_unix: disable sending io_uring over sockets - eth: mlx5e: - TC, don't offload post action rule if not supported - fix possible deadlock on mlx5e_tx_timeout_work - eth: iavf: fix iavf_shutdown to call iavf_remove instead iavf_close - eth: bnxt_en: fix skb recycling logic in bnxt_deliver_skb() - eth: ena: fix DMA syncing in XDP path when SWIOTLB is on - eth: team: fix use-after-free when an option instance allocation fails Previous releases - always broken: - neighbour: don't let neigh_forced_gc() disable preemption for long - net: prevent mss overflow in skb_segment() - ipv6: support reporting otherwise unknown prefix flags in RTM_NEWPREFIX - tcp: remove acked SYN flag from packet in the transmit queue correctly - eth: octeontx2-af: - fix a use-after-free in rvu_nix_register_reporters - fix promisc mcam entry action - eth: dwmac-loongson: make sure MDIO is initialized before use - eth: atlantic: fix double free in ring reinit logic" * tag 'net-6.7-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (62 commits) net: atlantic: fix double free in ring reinit logic appletalk: Fix Use-After-Free in atalk_ioctl net: stmmac: Handle disabled MDIO busses from devicetree net: stmmac: dwmac-qcom-ethqos: Fix drops in 10M SGMII RX dpaa2-switch: do not ask for MDB, VLAN and FDB replay dpaa2-switch: fix size of the dma_unmap net: prevent mss overflow in skb_segment() vsock/virtio: Fix unsigned integer wrap around in virtio_transport_has_space() Revert "tcp: disable tcp_autocorking for socket when TCP_NODELAY flag is set" MIPS: dts: loongson: drop incorrect dwmac fallback compatible stmmac: dwmac-loongson: drop useless check for compatible fallback stmmac: dwmac-loongson: Make sure MDIO is initialized before use tcp: disable tcp_autocorking for socket when TCP_NODELAY flag is set dpll: sanitize possible null pointer dereference in dpll_pin_parent_pin_set() net: ena: Fix XDP redirection error net: ena: Fix DMA syncing in XDP path when SWIOTLB is on net: ena: Fix xdp drops handling due to multibuf packets net: ena: Destroy correct number of xdp queues upon failure net: Remove acked SYN flag from packet in the transmit queue correctly qed: Fix a potential use-after-free in qed_cxt_tables_alloc ...
| * | net: atlantic: fix double free in ring reinit logicIgor Russkikh2023-12-141-1/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Driver has a logic leak in ring data allocation/free, where double free may happen in aq_ring_free if system is under stress and driver init/deinit is happening. The probability is higher to get this during suspend/resume cycle. Verification was done simulating same conditions with stress -m 2000 --vm-bytes 20M --vm-hang 10 --backoff 1000 while true; do sudo ifconfig enp1s0 down; sudo ifconfig enp1s0 up; done Fixed by explicitly clearing pointers to NULL on deallocation Fixes: 018423e90bee ("net: ethernet: aquantia: Add ring support code") Reported-by: Linus Torvalds <torvalds@linux-foundation.org> Closes: https://lore.kernel.org/netdev/CAHk-=wiZZi7FcvqVSUirHBjx0bBUZ4dFrMDVLc3+3HCrtq0rBA@mail.gmail.com/ Signed-off-by: Igor Russkikh <irusskikh@marvell.com> Link: https://lore.kernel.org/r/20231213094044.22988-1-irusskikh@marvell.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
| * | net: stmmac: Handle disabled MDIO busses from devicetreeAndrew Halaney2023-12-141-1/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Many hardware configurations have the MDIO bus disabled, and are instead using some other MDIO bus to talk to the MAC's phy. of_mdiobus_register() returns -ENODEV in this case. Let's handle it gracefully instead of failing to probe the MAC. Fixes: 47dd7a540b8a ("net: add support for STMicroelectronics Ethernet controllers.") Signed-off-by: Andrew Halaney <ahalaney@redhat.com> Reviewed-by: Serge Semin <fancer.lancer@gmail.com> Link: https://lore.kernel.org/r/20231212-b4-stmmac-handle-mdio-enodev-v2-1-600171acf79f@redhat.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
| * | net: stmmac: dwmac-qcom-ethqos: Fix drops in 10M SGMII RXSneh Shah2023-12-141-0/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In 10M SGMII mode all the packets are being dropped due to wrong Rx clock. SGMII 10MBPS mode needs RX clock divider programmed to avoid drops in Rx. Update configure SGMII function with Rx clk divider programming. Fixes: 463120c31c58 ("net: stmmac: dwmac-qcom-ethqos: add support for SGMII") Tested-by: Andrew Halaney <ahalaney@redhat.com> Signed-off-by: Sneh Shah <quic_snehshah@quicinc.com> Reviewed-by: Bjorn Andersson <quic_bjorande@quicinc.com> Link: https://lore.kernel.org/r/20231212092208.22393-1-quic_snehshah@quicinc.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
| * | Merge branch '40GbE' of ↵Jakub Kicinski2023-12-135-74/+219
| |\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue Tony Nguyen says: ==================== Intel Wired LAN Driver Updates 2023-12-12 (iavf) This series contains updates to iavf driver only. Piotr reworks Flow Director states to deal with issues in restoring filters. Slawomir fixes shutdown processing as it was missing needed calls. * '40GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue: iavf: Fix iavf_shutdown to call iavf_remove instead iavf_close iavf: Handle ntuple on/off based on new state machines for flow director iavf: Introduce new state machines for flow director ==================== Link: https://lore.kernel.org/r/20231212203613.513423-1-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
| | * | iavf: Fix iavf_shutdown to call iavf_remove instead iavf_closeSlawomir Laba2023-12-121-51/+21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Make the flow for pci shutdown be the same to the pci remove. iavf_shutdown was implementing an incomplete version of iavf_remove. It misses several calls to the kernel like iavf_free_misc_irq, iavf_reset_interrupt_capability, iounmap that might break the system on reboot or hibernation. Implement the call of iavf_remove directly in iavf_shutdown to close this gap. Fixes below error messages (dmesg) during shutdown stress tests - [685814.900917] ice 0000:88:00.0: MAC 02:d0:5f:82:43:5d does not exist for VF 0 [685814.900928] ice 0000:88:00.0: MAC 33:33:00:00:00:01 does not exist for VF 0 Reproduction: 1. Create one VF interface: echo 1 > /sys/class/net/<interface_name>/device/sriov_numvfs 2. Run live dmesg on the host: dmesg -wH 3. On SUT, script below steps into vf_namespace_assignment.sh <#!/bin/sh> // Remove <>. Git removes # line if=<VF name> (edit this per VF name) loop=0 while true; do echo test round $loop let loop++ ip netns add ns$loop ip link set dev $if up ip link set dev $if netns ns$loop ip netns exec ns$loop ip link set dev $if up ip netns exec ns$loop ip link set dev $if netns 1 ip netns delete ns$loop done 4. Run the script for at least 1000 iterations on SUT: ./vf_namespace_assignment.sh Expected result: No errors in dmesg. Fixes: 129cf89e5856 ("iavf: rename functions and structs to new name") Signed-off-by: Slawomir Laba <slawomirx.laba@intel.com> Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Reviewed-by: Ahmed Zaki <ahmed.zaki@intel.com> Reviewed-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Co-developed-by: Ranganatha Rao <ranganatha.rao@intel.com> Signed-off-by: Ranganatha Rao <ranganatha.rao@intel.com> Tested-by: Rafal Romanowski <rafal.romanowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
| | * | iavf: Handle ntuple on/off based on new state machines for flow directorPiotr Gardocki2023-12-121-0/+59
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | ntuple-filter feature on/off: Default is on. If turned off, the filters will be removed from both PF and iavf list. The removal is irrespective of current filter state. Steps to reproduce: ------------------- 1. Ensure ntuple is on. ethtool -K enp8s0 ntuple-filters on 2. Create a filter to receive the traffic into non-default rx-queue like 15 and ensure traffic is flowing into queue into 15. Now, turn off ntuple. Traffic should not flow to configured queue 15. It should flow to default RX queue. Fixes: 0dbfbabb840d ("iavf: Add framework to enable ethtool ntuple filters") Signed-off-by: Piotr Gardocki <piotrx.gardocki@intel.com> Reviewed-by: Larysa Zaremba <larysa.zaremba@intel.com> Signed-off-by: Ranganatha Rao <ranganatha.rao@intel.com> Tested-by: Rafal Romanowski <rafal.romanowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
| | * | iavf: Introduce new state machines for flow directorPiotr Gardocki2023-12-125-23/+139
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | New states introduced: IAVF_FDIR_FLTR_DIS_REQUEST IAVF_FDIR_FLTR_DIS_PENDING IAVF_FDIR_FLTR_INACTIVE Current FDIR state machines (SM) are not adequate to handle a few scenarios in the link DOWN/UP event, reset event and ntuple-feature. For example, when VF link goes DOWN and comes back UP administratively, the expectation is that previously installed filters should also be restored. But with current SM, filters are not restored. So with new SM, during link DOWN filters are marked as INACTIVE in the iavf list but removed from PF. After link UP, SM will transition from INACTIVE to ADD_REQUEST to restore the filter. Similarly, with VF reset, filters will be removed from the PF, but marked as INACTIVE in the iavf list. Filters will be restored after reset completion. Steps to reproduce: ------------------- 1. Create a VF. Here VF is enp8s0. 2. Assign IP addresses to VF and link partner and ping continuously from remote. Here remote IP is 1.1.1.1. 3. Check default RX Queue of traffic. ethtool -S enp8s0 | grep -E "rx-[[:digit:]]+\.packets" 4. Add filter - change default RX Queue (to 15 here) ethtool -U ens8s0 flow-type ip4 src-ip 1.1.1.1 action 15 loc 5 5. Ensure filter gets added and traffic is received on RX queue 15 now. Link event testing: ------------------- 6. Bring VF link down and up. If traffic flows to configured queue 15, test is success, otherwise it is a failure. Reset event testing: -------------------- 7. Reset the VF. If traffic flows to configured queue 15, test is success, otherwise it is a failure. Fixes: 0dbfbabb840d ("iavf: Add framework to enable ethtool ntuple filters") Signed-off-by: Piotr Gardocki <piotrx.gardocki@intel.com> Reviewed-by: Larysa Zaremba <larysa.zaremba@intel.com> Signed-off-by: Ranganatha Rao <ranganatha.rao@intel.com> Tested-by: Rafal Romanowski <rafal.romanowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
| * | | dpaa2-switch: do not ask for MDB, VLAN and FDB replayIoana Ciornei2023-12-131-9/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Starting with commit 4e51bf44a03a ("net: bridge: move the switchdev object replay helpers to "push" mode") the switchdev_bridge_port_offload() helper was extended with the intention to provide switchdev drivers easy access to object addition and deletion replays. This works by calling the replay helpers with non-NULL notifier blocks. In the same commit, the dpaa2-switch driver was updated so that it passes valid notifier blocks to the helper. At that moment, no regression was identified through testing. In the meantime, the blamed commit changed the behavior in terms of which ports get hit by the replay. Before this commit, only the initial port which identified itself as offloaded through switchdev_bridge_port_offload() got a replay of all port objects and FDBs. After this, the newly joining port will trigger a replay of objects on all bridge ports and on the bridge itself. This behavior leads to errors in dpaa2_switch_port_vlans_add() when a VLAN gets installed on the same interface multiple times. The intended mechanism to address this is to pass a non-NULL ctx to the switchdev_bridge_port_offload() helper and then check it against the port's private structure. But since the driver does not have any use for the replayed port objects and FDBs until it gains support for LAG offload, it's better to fix the issue by reverting the dpaa2-switch driver to not ask for replay. The pointers will be added back when we are prepared to ignore replays on unrelated ports. Fixes: b28d580e2939 ("net: bridge: switchdev: replay all VLAN groups") Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com> Link: https://lore.kernel.org/r/20231212164326.2753457-3-ioana.ciornei@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
| * | | dpaa2-switch: fix size of the dma_unmapIoana Ciornei2023-12-131-3/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The size of the DMA unmap was wrongly put as a sizeof of a pointer. Change the value of the DMA unmap to be the actual macro used for the allocation and the DMA map. Fixes: 1110318d83e8 ("dpaa2-switch: add tc flower hardware offload on ingress traffic") Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com> Link: https://lore.kernel.org/r/20231212164326.2753457-2-ioana.ciornei@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
| * | | stmmac: dwmac-loongson: drop useless check for compatible fallbackKrzysztof Kozlowski2023-12-131-5/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Device binds to proper PCI ID (LOONGSON, 0x7a03), already listed in DTS, so checking for some other compatible does not make sense. It cannot be bound to unsupported platform. Drop useless, incorrect (space in between) and undocumented compatible. Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Signed-off-by: Yanteng Si <siyanteng@loongson.cn> Reviewed-by: Conor Dooley <conor.dooley@microchip.com> Acked-by: Jiaxun Yang <jiaxun.yang@flygoat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | stmmac: dwmac-loongson: Make sure MDIO is initialized before useYanteng Si2023-12-131-8/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Generic code will use mdio. If it is not initialized before use, the kernel will Oops. Fixes: 30bba69d7db4 ("stmmac: pci: Add dwmac support for Loongson") Signed-off-by: Yanteng Si <siyanteng@loongson.cn> Signed-off-by: Feiyang Chen <chenfeiyang@loongson.cn> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | dpll: sanitize possible null pointer dereference in dpll_pin_parent_pin_set()Jiri Pirko2023-12-121-5/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | User may not pass DPLL_A_PIN_STATE attribute in the pin set operation message. Sanitize that by checking if the attr pointer is not null and process the passed state attribute value only in that case. Reported-by: Xingyuan Mo <hdthky0@gmail.com> Fixes: 9d71b54b65b1 ("dpll: netlink: Add DPLL framework base functions") Signed-off-by: Jiri Pirko <jiri@nvidia.com> Acked-by: Vadim Fedorenko <vadim.fedorenko@linux.dev> Link: https://lore.kernel.org/r/20231211083758.1082853-1-jiri@resnulli.us Signed-off-by: Jakub Kicinski <kuba@kernel.org>
| * | | net: ena: Fix XDP redirection errorDavid Arinzon2023-12-121-3/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When sending TX packets, the meta descriptor can be all zeroes as no meta information is required (as in XDP). This patch removes the validity check, as when `disable_meta_caching` is enabled, such TX packets will be dropped otherwise. Fixes: 0e3a3f6dacf0 ("net: ena: support new LLQ acceleration mode") Signed-off-by: Shay Agroskin <shayagr@amazon.com> Signed-off-by: David Arinzon <darinzon@amazon.com> Link: https://lore.kernel.org/r/20231211062801.27891-5-darinzon@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
| * | | net: ena: Fix DMA syncing in XDP path when SWIOTLB is onDavid Arinzon2023-12-121-14/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch fixes two issues: Issue 1 ------- Description ``````````` Current code does not call dma_sync_single_for_cpu() to sync data from the device side memory to the CPU side memory before the XDP code path uses the CPU side data. This causes the XDP code path to read the unset garbage data in the CPU side memory, resulting in incorrect handling of the packet by XDP. Solution ```````` 1. Add a call to dma_sync_single_for_cpu() before the XDP code starts to use the data in the CPU side memory. 2. The XDP code verdict can be XDP_PASS, in which case there is a fallback to the non-XDP code, which also calls dma_sync_single_for_cpu(). To avoid calling dma_sync_single_for_cpu() twice: 2.1. Put the dma_sync_single_for_cpu() in the code in such a place where it happens before XDP and non-XDP code. 2.2. Remove the calls to dma_sync_single_for_cpu() in the non-XDP code for the first buffer only (rx_copybreak and non-rx_copybreak cases), since the new call that was added covers these cases. The call to dma_sync_single_for_cpu() for the second buffer and on stays because only the first buffer is handled by the newly added dma_sync_single_for_cpu(). And there is no need for special handling of the second buffer and on for the XDP path since currently the driver supports only single buffer packets. Issue 2 ------- Description ``````````` In case the XDP code forwarded the packet (ENA_XDP_FORWARDED), ena_unmap_rx_buff_attrs() is called with attrs set to 0. This means that before unmapping the buffer, the internal function dma_unmap_page_attrs() will also call dma_sync_single_for_cpu() on the whole buffer (not only on the data part of it). This sync is both wasteful (since a sync was already explicitly called before) and also causes a bug, which will be explained using the below diagram. The following diagram shows the flow of events causing the bug. The order of events is (1)-(4) as shown in the diagram. CPU side memory area (3)convert_to_xdp_frame() initializes the headroom with xdpf metadata || \/ ___________________________________ | | 0 | V 4K --------------------------------------------------------------------- | xdpf->data | other xdpf | < data > | tailroom ||...| | | fields | | GARBAGE || | --------------------------------------------------------------------- /\ /\ || || (4)ena_unmap_rx_buff_attrs() calls (2)dma_sync_single_for_cpu() dma_sync_single_for_cpu() on the copies data from device whole buffer page, overwriting side to CPU side memory the xdpf->data with GARBAGE. || 0 4K --------------------------------------------------------------------- | headroom | < data > | tailroom ||...| | GARBAGE | | GARBAGE || | --------------------------------------------------------------------- Device side memory area /\ || (1) device writes RX packet data After the call to ena_unmap_rx_buff_attrs() in (4), the xdpf->data becomes corrupted, and so when it is later accessed in ena_clean_xdp_irq()->xdp_return_frame(), it causes a page fault, crashing the kernel. Solution ```````` Explicitly tell ena_unmap_rx_buff_attrs() not to call dma_sync_single_for_cpu() by passing it the ENA_DMA_ATTR_SKIP_CPU_SYNC flag. Fixes: f7d625adeb7b ("net: ena: Add dynamic recycling mechanism for rx buffers") Signed-off-by: Arthur Kiyanovski <akiyano@amazon.com> Signed-off-by: David Arinzon <darinzon@amazon.com> Link: https://lore.kernel.org/r/20231211062801.27891-4-darinzon@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
| * | | net: ena: Fix xdp drops handling due to multibuf packetsDavid Arinzon2023-12-121-7/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Current xdp code drops packets larger than ENA_XDP_MAX_MTU. This is an incorrect condition since the problem is not the size of the packet, rather the number of buffers it contains. This commit: 1. Identifies and drops XDP multi-buffer packets at the beginning of the function. 2. Increases the xdp drop statistic when this drop occurs. 3. Adds a one-time print that such drops are happening to give better indication to the user. Fixes: 838c93dc5449 ("net: ena: implement XDP drop support") Signed-off-by: Arthur Kiyanovski <akiyano@amazon.com> Signed-off-by: David Arinzon <darinzon@amazon.com> Link: https://lore.kernel.org/r/20231211062801.27891-3-darinzon@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
| * | | net: ena: Destroy correct number of xdp queues upon failureDavid Arinzon2023-12-121-6/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The ena_setup_and_create_all_xdp_queues() function freed all the resources upon failure, after creating only xdp_num_queues queues, instead of freeing just the created ones. In this patch, the only resources that are freed, are the ones allocated right before the failure occurs. Fixes: 548c4940b9f1 ("net: ena: Implement XDP_TX action") Signed-off-by: Shahar Itzko <itzko@amazon.com> Signed-off-by: David Arinzon <darinzon@amazon.com> Link: https://lore.kernel.org/r/20231211062801.27891-2-darinzon@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
| * | | qed: Fix a potential use-after-free in qed_cxt_tables_allocDinghao Liu2023-12-121-0/+1
| |/ / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | qed_ilt_shadow_alloc() will call qed_ilt_shadow_free() to free p_hwfn->p_cxt_mngr->ilt_shadow on error. However, qed_cxt_tables_alloc() accesses the freed pointer on failure of qed_ilt_shadow_alloc() through calling qed_cxt_mngr_free(), which may lead to use-after-free. Fix this issue by setting p_mngr->ilt_shadow to NULL in qed_ilt_shadow_free(). Fixes: fe56b9e6a8d9 ("qed: Add module with basic common support") Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Signed-off-by: Dinghao Liu <dinghao.liu@zju.edu.cn> Link: https://lore.kernel.org/r/20231210045255.21383-1-dinghao.liu@zju.edu.cn Signed-off-by: Jakub Kicinski <kuba@kernel.org>
| * | octeontx2-af: Fix pause frame configurationHariprasad Kelam2023-12-111-4/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The current implementation's default Pause Forward setting is causing unnecessary network traffic. This patch disables Pause Forward to address this issue. Fixes: 1121f6b02e7a ("octeontx2-af: Priority flow control configuration support") Signed-off-by: Hariprasad Kelam <hkelam@marvell.com> Signed-off-by: Sunil Kovvuri Goutham <sgoutham@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | octeontx2-af: Update RSS algorithm indexHariprasad Kelam2023-12-111-11/+44
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The RSS flow algorithm is not set up correctly for promiscuous or all multi MCAM entries. This has an impact on flow distribution. This patch fixes the issue by updating flow algorithm index in above mentioned MCAM entries. Fixes: 967db3529eca ("octeontx2-af: add support for multicast/promisc packet replication feature") Signed-off-by: Hariprasad Kelam <hkelam@marvell.com> Signed-off-by: Sunil Kovvuri Goutham <sgoutham@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | octeontx2-pf: Fix promisc mcam entry actionHariprasad Kelam2023-12-111-3/+22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Current implementation is such that, promisc mcam entry action is set as multicast even when there are no trusted VFs. multicast action causes the hardware to copy packet data, which reduces the performance. This patch fixes this issue by setting the promisc mcam entry action to unicast instead of multicast when there are no trusted VFs. The same change is made for the 'allmulti' mcam entry action. Fixes: ffd2f89ad05c ("octeontx2-pf: Enable promisc/allmulti match MCAM entries.") Signed-off-by: Hariprasad Kelam <hkelam@marvell.com> Signed-off-by: Sunil Kovvuri Goutham <sgoutham@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | octeon_ep: explicitly test for firmware ready valueShinas Rasheed2023-12-111-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The firmware ready value is 1, and get firmware ready status function should explicitly test for that value. The firmware ready value read will be 2 after driver load, and on unbind till firmware rewrites the firmware ready back to 0, the value seen by driver will be 2, which should be regarded as not ready. Fixes: 10c073e40469 ("octeon_ep: defer probe if firmware not ready") Signed-off-by: Shinas Rasheed <srasheed@marvell.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | octeontx2-af: fix a use-after-free in rvu_nix_register_reportersZhipeng Lu2023-12-101-4/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The rvu_dl will be freed in rvu_nix_health_reporters_destroy(rvu_dl) after the create_workqueue fails, and after that free, the rvu_dl will be translate back through the following call chain: rvu_nix_health_reporters_destroy |-> rvu_nix_health_reporters_create |-> rvu_health_reporters_create |-> rvu_register_dl (label err_dl_health) Finally. in the err_dl_health label, rvu_dl being freed again in rvu_health_reporters_destroy(rvu) by rvu_nix_health_reporters_destroy. In the second calls of rvu_nix_health_reporters_destroy, however, it uses rvu_dl->rvu_nix_health_reporter, which is already freed at the end of rvu_nix_health_reporters_destroy in the first call. So this patch prevents the first destroy by instantly returning -ENONMEN when create_workqueue fails. In addition, since the failure of create_workqueue is the only entrence of label err, it has been integrated into the error-handling path of create_workqueue. Fixes: 5ed66306eab6 ("octeontx2-af: Add devlink health reporters for NIX") Signed-off-by: Zhipeng Lu <alexious@zju.edu.cn> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | net: fec: correct queue selectionRadu Bulie2023-12-101-16/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The old implementation extracted VLAN TCI info from the payload before the VLAN tag has been pushed in the payload. Another problem was that the VLAN TCI was extracted even if the packet did not have VLAN protocol header. This resulted in invalid VLAN TCI and as a consequence a random queue was computed. This patch fixes the above issues and use the VLAN TCI from the skb if it is present or VLAN TCI from payload if present. If no VLAN header is present queue 0 is selected. Fixes: 52c4a1a85f4b ("net: fec: add ndo_select_queue to fix TX bandwidth fluctuations") Signed-off-by: Radu Bulie <radu-andrei.bulie@nxp.com> Signed-off-by: Wei Fang <wei.fang@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | atm: solos-pci: Fix potential deadlock on &tx_queue_lockChengfeng Ye2023-12-091-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | As &card->tx_queue_lock is acquired under softirq context along the following call chain from solos_bh(), other acquisition of the same lock inside process context should disable at least bh to avoid double lock. <deadlock #2> pclose() --> spin_lock(&card->tx_queue_lock) <interrupt> --> solos_bh() --> fpga_tx() --> spin_lock(&card->tx_queue_lock) This flaw was found by an experimental static analysis tool I am developing for irq-related deadlock. To prevent the potential deadlock, the patch uses spin_lock_bh() on &card->tx_queue_lock under process context code consistently to prevent the possible deadlock scenario. Fixes: 213e85d38912 ("solos-pci: clean up pclose() function") Signed-off-by: Chengfeng Ye <dg573847474@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | atm: solos-pci: Fix potential deadlock on &cli_queue_lockChengfeng Ye2023-12-091-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | As &card->cli_queue_lock is acquired under softirq context along the following call chain from solos_bh(), other acquisition of the same lock inside process context should disable at least bh to avoid double lock. <deadlock #1> console_show() --> spin_lock(&card->cli_queue_lock) <interrupt> --> solos_bh() --> spin_lock(&card->cli_queue_lock) This flaw was found by an experimental static analysis tool I am developing for irq-related deadlock. To prevent the potential deadlock, the patch uses spin_lock_bh() on the card->cli_queue_lock under process context code consistently to prevent the possible deadlock scenario. Fixes: 9c54004ea717 ("atm: Driver for Solos PCI ADSL2+ card.") Signed-off-by: Chengfeng Ye <dg573847474@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | bnxt_en: Fix HWTSTAMP_FILTER_ALL packet timestamp logicMichael Chan2023-12-082-4/+24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When the chip is configured to timestamp all receive packets, the timestamp in the RX completion is only valid if the metadata present flag is not set for packets received on the wire. In addition, internal loopback packets will never have a valid timestamp and the timestamp field will always be zero. We must exclude any 0 value in the timestamp field because there is no way to determine if it is a loopback packet or not. Add a new function bnxt_rx_ts_valid() to check for all timestamp valid conditions. Fixes: 66ed81dcedc6 ("bnxt_en: Enable packet timestamping for all RX packets") Reviewed-by: Andy Gospodarek <andrew.gospodarek@broadcom.com> Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Link: https://lore.kernel.org/r/20231208001658.14230-5-michael.chan@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
| * | bnxt_en: Fix wrong return value check in bnxt_close_nic()Kalesh AP2023-12-085-34/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The wait_event_interruptible_timeout() function returns 0 if the timeout elapsed, -ERESTARTSYS if it was interrupted by a signal, and the remaining jiffies otherwise if the condition evaluated to true before the timeout elapsed. Driver should have checked for zero return value instead of a positive value. MChan: Print a warning for -ERESTARTSYS. The close operation will proceed anyway when wait_event_interruptible_timeout() returns for any reason. Since we do the close no matter what, we should not return this error code to the caller. Change bnxt_close_nic() to a void function and remove all error handling from some of the callers. Fixes: c0c050c58d84 ("bnxt_en: New Broadcom ethernet driver.") Reviewed-by: Andy Gospodarek <andrew.gospodarek@broadcom.com> Reviewed-by: Vikas Gupta <vikas.gupta@broadcom.com> Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com> Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Link: https://lore.kernel.org/r/20231208001658.14230-4-michael.chan@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
| * | bnxt_en: Fix skb recycling logic in bnxt_deliver_skb()Sreekanth Reddy2023-12-081-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Receive SKBs can go through the VF-rep path or the normal path. skb_mark_for_recycle() is only called for the normal path. Fix it to do it for both paths to fix possible stalled page pool shutdown errors. Fixes: 86b05508f775 ("bnxt_en: Use the unified RX page pool buffers for XDP and non-XDP") Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com> Reviewed-by: Andy Gospodarek <andrew.gospodarek@broadcom.com> Reviewed-by: Vikas Gupta <vikas.gupta@broadcom.com> Signed-off-by: Sreekanth Reddy <sreekanth.reddy@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Link: https://lore.kernel.org/r/20231208001658.14230-3-michael.chan@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
| * | bnxt_en: Clear resource reservation during resumeSomnath Kotur2023-12-081-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We are issuing HWRM_FUNC_RESET cmd to reset the device including all reserved resources, but not clearing the reservations within the driver struct. As a result, when the driver re-initializes as part of resume, it believes that there is no need to do any resource reservation and goes ahead and tries to allocate rings which will eventually fail beyond a certain number pre-reserved by the firmware. Fixes: 674f50a5b026 ("bnxt_en: Implement new method to reserve rings.") Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Reviewed-by: Ajit Khaparde <ajit.khaparde@broadcom.com> Reviewed-by: Andy Gospodarek <andrew.gospodarek@broadcom.com> Signed-off-by: Somnath Kotur <somnath.kotur@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Link: https://lore.kernel.org/r/20231208001658.14230-2-michael.chan@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
| * | qca_spi: Fix reset behaviorStefan Wahren2023-12-081-1/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In case of a reset triggered by the QCA7000 itself, the behavior of the qca_spi driver was not quite correct: - in case of a pending RX frame decoding the drop counter must be incremented and decoding state machine reseted - also the reset counter must always be incremented regardless of sync state Fixes: 291ab06ecf67 ("net: qualcomm: new Ethernet over SPI driver for QCA7000") Signed-off-by: Stefan Wahren <wahrenst@gmx.net> Link: https://lore.kernel.org/r/20231206141222.52029-4-wahrenst@gmx.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
| * | qca_debug: Fix ethtool -G iface tx behaviorStefan Wahren2023-12-081-3/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | After calling ethtool -g it was not possible to adjust the TX ring size again: # ethtool -g eth1 Ring parameters for eth1: Pre-set maximums: RX: 4 RX Mini: n/a RX Jumbo: n/a TX: 10 Current hardware settings: RX: 4 RX Mini: n/a RX Jumbo: n/a TX: 10 # ethtool -G eth1 tx 8 netlink error: Invalid argument The reason for this is that the readonly setting rx_pending get initialized and after that the range check in qcaspi_set_ringparam() fails regardless of the provided parameter. So fix this by accepting the exposed RX defaults. Instead of adding another magic number better use a new define here. Fixes: 291ab06ecf67 ("net: qualcomm: new Ethernet over SPI driver for QCA7000") Suggested-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Stefan Wahren <wahrenst@gmx.net> Link: https://lore.kernel.org/r/20231206141222.52029-3-wahrenst@gmx.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
| * | qca_debug: Prevent crash on TX ring changesStefan Wahren2023-12-082-5/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The qca_spi driver stop and restart the SPI kernel thread (via ndo_stop & ndo_open) in case of TX ring changes. This is a big issue because it allows userspace to prevent restart of the SPI kernel thread (via signals). A subsequent change of TX ring wrongly assume a valid spi_thread pointer which result in a crash. So prevent this by stopping the network traffic handling and temporary park the SPI thread. Fixes: 291ab06ecf67 ("net: qualcomm: new Ethernet over SPI driver for QCA7000") Signed-off-by: Stefan Wahren <wahrenst@gmx.net> Link: https://lore.kernel.org/r/20231206141222.52029-2-wahrenst@gmx.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
| * | octeon_ep: initialise control mbox tasks before using APIsShinas Rasheed2023-12-081-9/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Initialise various workqueue tasks and queue interrupt poll task before the first invocation of any control net APIs. Since octep_ctrl_net_get_info was called before the control net receive work task was initialised or even the interrupt poll task was queued, the function call wasn't returning actual firmware info queried from Octeon. Fixes: 8d6198a14e2b ("octeon_ep: support to fetch firmware info") Signed-off-by: Shinas Rasheed <srasheed@marvell.com> Reviewed-by: Michal Schmidt <mschmidt@redhat.com> Link: https://lore.kernel.org/r/20231206135228.2591659-1-srasheed@marvell.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
| * | team: Fix use-after-free when an option instance allocation failsFlorent Revest2023-12-081-1/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In __team_options_register, team_options are allocated and appended to the team's option_list. If one option instance allocation fails, the "inst_rollback" cleanup path frees the previously allocated options but doesn't remove them from the team's option_list. This leaves dangling pointers that can be dereferenced later by other parts of the team driver that iterate over options. This patch fixes the cleanup path to remove the dangling pointers from the list. As far as I can tell, this uaf doesn't have much security implications since it would be fairly hard to exploit (an attacker would need to make the allocation of that specific small object fail) but it's still nice to fix. Cc: stable@vger.kernel.org Fixes: 80f7c6683fe0 ("team: add support for per-port options") Signed-off-by: Florent Revest <revest@chromium.org> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Reviewed-by: Hangbin Liu <liuhangbin@gmail.com> Link: https://lore.kernel.org/r/20231206123719.1963153-1-revest@chromium.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
| * | Merge tag 'mlx5-fixes-2023-12-04' of ↵David S. Miller2023-12-0815-309/+578
| |\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux Saeed Mahameed says: ==================== mlx5 fixes 2023-12-04 This series provides bug fixes to mlx5 driver. V1->V2: - Drop commit #9 ("net/mlx5e: Forbid devlink reload if IPSec rules are offloaded"), we are working on a better fix Please pull and let me know if there is any problem. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
| | * | net/mlx5: Fix a NULL vs IS_ERR() checkDan Carpenter2023-12-041-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The mlx5_esw_offloads_devlink_port() function returns error pointers, not NULL. Fixes: 7bef147a6ab6 ("net/mlx5: Don't skip vport check") Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org> Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
| | * | net/mlx5e: Check netdev pointer before checking its net nsGavin Li2023-12-041-6/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Previously, when comparing the net namespaces, the case where the netdev doesn't exist wasn't taken into account, and therefore can cause a crash. In such a case, the comparing function should return false, as there is no netdev->net to compare the devlink->net to. Furthermore, this will result in an attempt to enter switchdev mode without a netdev to fail, and which is the desired result as there is no meaning in switchdev mode without a net device. Fixes: 662404b24a4c ("net/mlx5e: Block entering switchdev mode with ns inconsistency") Signed-off-by: Gavin Li <gavinl@nvidia.com> Reviewed-by: Gavi Teitz <gavi@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
| | * | net/mlx5: Nack sync reset request when HotPlug is enabledMoshe Shemesh2023-12-041-0/+29
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Current sync reset flow is not supported when PCIe bridge connected directly to mlx5 device has HotPlug interrupt enabled and can be triggered on link state change event. Return nack on reset request in such case. Fixes: 92501fa6e421 ("net/mlx5: Ack on sync_reset_request only if PF can do reset_now") Signed-off-by: Moshe Shemesh <moshe@nvidia.com> Reviewed-by: Shay Drory <shayd@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
| | * | net/mlx5e: TC, Don't offload post action rule if not supportedChris Mi2023-12-042-4/+27
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If post action is not supported, eg. ignore_flow_level is not supported, don't offload post action rule. Otherwise, will hit panic [1]. Fix it by checking if post action table is valid or not. [1] [445537.863880] BUG: unable to handle page fault for address: ffffffffffffffb1 [445537.864617] #PF: supervisor read access in kernel mode [445537.865244] #PF: error_code(0x0000) - not-present page [445537.865860] PGD 70683a067 P4D 70683a067 PUD 70683c067 PMD 0 [445537.866497] Oops: 0000 [#1] PREEMPT SMP NOPTI [445537.867077] CPU: 19 PID: 248742 Comm: tc Kdump: loaded Tainted: G O 6.5.0+ #1 [445537.867888] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014 [445537.868834] RIP: 0010:mlx5e_tc_post_act_add+0x51/0x130 [mlx5_core] [445537.869635] Code: c0 0d 00 00 e8 20 96 c6 d3 48 85 c0 0f 84 e5 00 00 00 c7 83 b0 01 00 00 00 00 00 00 49 89 c5 31 c0 31 d2 66 89 83 b4 01 00 00 <49> 8b 44 24 10 83 23 df 83 8b d8 01 00 00 04 48 89 83 c0 01 00 00 [445537.871318] RSP: 0018:ffffb98741cef428 EFLAGS: 00010246 [445537.871962] RAX: 0000000000000000 RBX: ffff8df341167000 RCX: 0000000000000001 [445537.872704] RDX: 0000000000000000 RSI: ffffffff954844e1 RDI: ffffffff9546e9cb [445537.873430] RBP: ffffb98741cef448 R08: 0000000000000020 R09: 0000000000000246 [445537.874160] R10: 0000000000000000 R11: ffffffff943f73ff R12: ffffffffffffffa1 [445537.874893] R13: ffff8df36d336c20 R14: ffffffffffffffa1 R15: ffff8df341167000 [445537.875628] FS: 00007fcd6564f800(0000) GS:ffff8dfa9ea00000(0000) knlGS:0000000000000000 [445537.876425] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [445537.877090] CR2: ffffffffffffffb1 CR3: 00000003b5884001 CR4: 0000000000770ee0 [445537.877832] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [445537.878564] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [445537.879300] PKRU: 55555554 [445537.879797] Call Trace: [445537.880263] <TASK> [445537.880713] ? show_regs+0x6e/0x80 [445537.881232] ? __die+0x29/0x70 [445537.881731] ? page_fault_oops+0x85/0x160 [445537.882276] ? search_exception_tables+0x65/0x70 [445537.882852] ? kernelmode_fixup_or_oops+0xa2/0x120 [445537.883432] ? __bad_area_nosemaphore+0x18b/0x250 [445537.884019] ? bad_area_nosemaphore+0x16/0x20 [445537.884566] ? do_kern_addr_fault+0x8b/0xa0 [445537.885105] ? exc_page_fault+0xf5/0x1c0 [445537.885623] ? asm_exc_page_fault+0x2b/0x30 [445537.886149] ? __kmem_cache_alloc_node+0x1df/0x2a0 [445537.886717] ? mlx5e_tc_post_act_add+0x51/0x130 [mlx5_core] [445537.887431] ? mlx5e_tc_post_act_add+0x30/0x130 [mlx5_core] [445537.888172] alloc_flow_post_acts+0xfb/0x1c0 [mlx5_core] [445537.888849] parse_tc_actions+0x582/0x5c0 [mlx5_core] [445537.889505] parse_tc_fdb_actions+0xd7/0x1f0 [mlx5_core] [445537.890175] __mlx5e_add_fdb_flow+0x1ab/0x2b0 [mlx5_core] [445537.890843] mlx5e_add_fdb_flow+0x56/0x120 [mlx5_core] [445537.891491] ? debug_smp_processor_id+0x1b/0x30 [445537.892037] mlx5e_tc_add_flow+0x79/0x90 [mlx5_core] [445537.892676] mlx5e_configure_flower+0x305/0x450 [mlx5_core] [445537.893341] mlx5e_rep_setup_tc_cls_flower+0x3d/0x80 [mlx5_core] [445537.894037] mlx5e_rep_setup_tc_cb+0x5c/0xa0 [mlx5_core] [445537.894693] tc_setup_cb_add+0xdc/0x220 [445537.895177] fl_hw_replace_filter+0x15f/0x220 [cls_flower] [445537.895767] fl_change+0xe87/0x1190 [cls_flower] [445537.896302] tc_new_tfilter+0x484/0xa50 Fixes: f0da4daa3413 ("net/mlx5e: Refactor ct to use post action infrastructure") Signed-off-by: Chris Mi <cmi@nvidia.com> Reviewed-by: Jianbo Liu <jianbol@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Reviewed-by: Automatic Verification <verifier@nvidia.com> Reviewed-by: Maher Sanalla <msanalla@nvidia.com> Reviewed-by: Shay Drory <shayd@nvidia.com> Reviewed-by: Moshe Shemesh <moshe@nvidia.com> Reviewed-by: Shachar Kagan <skagan@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
| | * | net/mlx5e: Fix possible deadlock on mlx5e_tx_timeout_workMoshe Shemesh2023-12-042-3/+25
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Due to the cited patch, devlink health commands take devlink lock and this may result in deadlock for mlx5e_tx_reporter as it takes local state_lock before calling devlink health report and on the other hand devlink health commands such as diagnose for same reporter take local state_lock after taking devlink lock (see kernel log below). To fix it, remove local state_lock from mlx5e_tx_timeout_work() before calling devlink_health_report() and take care to cancel the work before any call to close channels, which may free the SQs that should be handled by the work. Before cancel_work_sync(), use current_work() to check we are not calling it from within the work, as mlx5e_tx_timeout_work() itself may close the channels and reopen as part of recovery flow. While removing state_lock from mlx5e_tx_timeout_work() keep rtnl_lock to ensure no change in netdev->real_num_tx_queues, but use rtnl_trylock() and a flag to avoid deadlock by calling cancel_work_sync() before closing the channels while holding rtnl_lock too. Kernel log: ====================================================== WARNING: possible circular locking dependency detected 6.0.0-rc3_for_upstream_debug_2022_08_30_13_10 #1 Not tainted ------------------------------------------------------ kworker/u16:2/65 is trying to acquire lock: ffff888122f6c2f8 (&devlink->lock_key#2){+.+.}-{3:3}, at: devlink_health_report+0x2f1/0x7e0 but task is already holding lock: ffff888121d20be0 (&priv->state_lock){+.+.}-{3:3}, at: mlx5e_tx_timeout_work+0x70/0x280 [mlx5_core] which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #1 (&priv->state_lock){+.+.}-{3:3}: __mutex_lock+0x12c/0x14b0 mlx5e_rx_reporter_diagnose+0x71/0x700 [mlx5_core] devlink_nl_cmd_health_reporter_diagnose_doit+0x212/0xa50 genl_family_rcv_msg_doit+0x1e9/0x2f0 genl_rcv_msg+0x2e9/0x530 netlink_rcv_skb+0x11d/0x340 genl_rcv+0x24/0x40 netlink_unicast+0x438/0x710 netlink_sendmsg+0x788/0xc40 sock_sendmsg+0xb0/0xe0 __sys_sendto+0x1c1/0x290 __x64_sys_sendto+0xdd/0x1b0 do_syscall_64+0x3d/0x90 entry_SYSCALL_64_after_hwframe+0x46/0xb0 -> #0 (&devlink->lock_key#2){+.+.}-{3:3}: __lock_acquire+0x2c8a/0x6200 lock_acquire+0x1c1/0x550 __mutex_lock+0x12c/0x14b0 devlink_health_report+0x2f1/0x7e0 mlx5e_health_report+0xc9/0xd7 [mlx5_core] mlx5e_reporter_tx_timeout+0x2ab/0x3d0 [mlx5_core] mlx5e_tx_timeout_work+0x1c1/0x280 [mlx5_core] process_one_work+0x7c2/0x1340 worker_thread+0x59d/0xec0 kthread+0x28f/0x330 ret_from_fork+0x1f/0x30 other info that might help us debug this: Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(&priv->state_lock); lock(&devlink->lock_key#2); lock(&priv->state_lock); lock(&devlink->lock_key#2); *** DEADLOCK *** 4 locks held by kworker/u16:2/65: #0: ffff88811a55b138 ((wq_completion)mlx5e#2){+.+.}-{0:0}, at: process_one_work+0x6e2/0x1340 #1: ffff888101de7db8 ((work_completion)(&priv->tx_timeout_work)){+.+.}-{0:0}, at: process_one_work+0x70f/0x1340 #2: ffffffff84ce8328 (rtnl_mutex){+.+.}-{3:3}, at: mlx5e_tx_timeout_work+0x53/0x280 [mlx5_core] #3: ffff888121d20be0 (&priv->state_lock){+.+.}-{3:3}, at: mlx5e_tx_timeout_work+0x70/0x280 [mlx5_core] stack backtrace: CPU: 1 PID: 65 Comm: kworker/u16:2 Not tainted 6.0.0-rc3_for_upstream_debug_2022_08_30_13_10 #1 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014 Workqueue: mlx5e mlx5e_tx_timeout_work [mlx5_core] Call Trace: <TASK> dump_stack_lvl+0x57/0x7d check_noncircular+0x278/0x300 ? print_circular_bug+0x460/0x460 ? find_held_lock+0x2d/0x110 ? __stack_depot_save+0x24c/0x520 ? alloc_chain_hlocks+0x228/0x700 __lock_acquire+0x2c8a/0x6200 ? register_lock_class+0x1860/0x1860 ? kasan_save_stack+0x1e/0x40 ? kasan_set_free_info+0x20/0x30 ? ____kasan_slab_free+0x11d/0x1b0 ? kfree+0x1ba/0x520 ? devlink_health_do_dump.part.0+0x171/0x3a0 ? devlink_health_report+0x3d5/0x7e0 lock_acquire+0x1c1/0x550 ? devlink_health_report+0x2f1/0x7e0 ? lockdep_hardirqs_on_prepare+0x400/0x400 ? find_held_lock+0x2d/0x110 __mutex_lock+0x12c/0x14b0 ? devlink_health_report+0x2f1/0x7e0 ? devlink_health_report+0x2f1/0x7e0 ? mutex_lock_io_nested+0x1320/0x1320 ? trace_hardirqs_on+0x2d/0x100 ? bit_wait_io_timeout+0x170/0x170 ? devlink_health_do_dump.part.0+0x171/0x3a0 ? kfree+0x1ba/0x520 ? devlink_health_do_dump.part.0+0x171/0x3a0 devlink_health_report+0x2f1/0x7e0 mlx5e_health_report+0xc9/0xd7 [mlx5_core] mlx5e_reporter_tx_timeout+0x2ab/0x3d0 [mlx5_core] ? lockdep_hardirqs_on_prepare+0x400/0x400 ? mlx5e_reporter_tx_err_cqe+0x1b0/0x1b0 [mlx5_core] ? mlx5e_tx_reporter_timeout_dump+0x70/0x70 [mlx5_core] ? mlx5e_tx_reporter_dump_sq+0x320/0x320 [mlx5_core] ? mlx5e_tx_timeout_work+0x70/0x280 [mlx5_core] ? mutex_lock_io_nested+0x1320/0x1320 ? process_one_work+0x70f/0x1340 ? lockdep_hardirqs_on_prepare+0x400/0x400 ? lock_downgrade+0x6e0/0x6e0 mlx5e_tx_timeout_work+0x1c1/0x280 [mlx5_core] process_one_work+0x7c2/0x1340 ? lockdep_hardirqs_on_prepare+0x400/0x400 ? pwq_dec_nr_in_flight+0x230/0x230 ? rwlock_bug.part.0+0x90/0x90 worker_thread+0x59d/0xec0 ? process_one_work+0x1340/0x1340 kthread+0x28f/0x330 ? kthread_complete_and_exit+0x20/0x20 ret_from_fork+0x1f/0x30 </TASK> Fixes: c90005b5f75c ("devlink: Hold the instance lock in health callbacks") Signed-off-by: Moshe Shemesh <moshe@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
| | * | net/mlx5e: Disable IPsec offload support if not FW steeringChris Mi2023-12-042-16/+18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | IPsec FDB offload can only work with FW steering as of now, disable the cap upon non FW steering. And since the IPSec cap is dynamic now based on steering mode. Cleanup the resources if they exist instead of checking the IPsec cap again. Fixes: edd8b295f9e2 ("Merge branch 'mlx5-ipsec-packet-offload-support-in-eswitch-mode'") Signed-off-by: Chris Mi <cmi@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
| | * | net/mlx5e: Check the number of elements before walk TC rhashtableJianbo Liu2023-12-041-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | After IPSec TX tables are destroyed, the flow rules in TC rhashtable, which have the destination to IPSec, are restored to the original one, the uplink. However, when the device is in switchdev mode and unload driver with IPSec rules configured, TC rhashtable cleanup is done before IPSec cleanup, which means tc_ht->tbl is already freed when walking TC rhashtable, in order to restore the destination. So add the checking before walking to avoid unexpected behavior. Fixes: d1569537a837 ("net/mlx5e: Modify and restore TC rules for IPSec TX rules") Signed-off-by: Jianbo Liu <jianbol@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
| | * | net/mlx5e: Reduce eswitch mode_lock protection contextJianbo Liu2023-12-044-32/+52
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently eswitch mode_lock is so heavy, for example, it's locked during the whole process of the mode change, which may need to hold other locks. As the mode_lock is also used by IPSec to block mode and encap change now, it is easy to cause lock dependency. Since some of protections are also done by devlink lock, the eswitch mode_lock is not needed at those places, and thus the possibility of lockdep issue is reduced. Fixes: c8e350e62fc5 ("net/mlx5e: Make TC and IPsec offloads mutually exclusive on a netdev") Signed-off-by: Jianbo Liu <jianbol@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
| | * | net/mlx5e: Tidy up IPsec NAT-T SA discoveryLeon Romanovsky2023-12-041-6/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | IPsec NAT-T packets are UDP encapsulated packets over ESP normal ones. In case they arrive to RX, the SPI and ESP are located in inner header, while the check was performed on outer header instead. That wrong check caused to the situation where received rekeying request was missed and caused to rekey timeout, which "compensated" this failure by completing rekeying. Fixes: d65954934937 ("net/mlx5e: Support IPsec NAT-T functionality") Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
| | * | net/mlx5e: Add IPsec and ASO syndromes check in HWPatrisious Haddad2023-12-042-20/+223
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | After IPsec decryption it isn't enough to only check the IPsec syndrome but need to also check the ASO syndrome in order to verify that the operation was actually successful. Verify that both syndromes are actually zero and in case not drop the packet and increment the appropriate flow counter for the drop reason. Fixes: 6b5c45e16e43 ("net/mlx5e: Configure IPsec packet offload flow steering") Signed-off-by: Patrisious Haddad <phaddad@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com>