summaryrefslogtreecommitdiffstats
path: root/drivers/net/ethernet/broadcom
Commit message (Collapse)AuthorAgeFilesLines
* bnxt_en: Fix crash in bnxt_get_max_rss_ctx_ring()Michael Chan2 days1-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | On older chips not supporting multiple RSS contexts, reducing ethtool channels will crash: BUG: kernel NULL pointer dereference, address: 00000000000000b8 PGD 0 P4D 0 Oops: Oops: 0000 [#1] PREEMPT SMP PTI CPU: 1 PID: 7032 Comm: ethtool Tainted: G S 6.10.0-rc4 #1 Hardware name: Dell Inc. PowerEdge R730/072T6D, BIOS 2.4.3 01/17/2017 RIP: 0010:bnxt_get_max_rss_ctx_ring+0x4c/0x90 [bnxt_en] Code: c3 d3 eb 4c 8b 83 38 01 00 00 48 8d bb 38 01 00 00 4c 39 c7 74 42 41 8d 54 24 ff 31 c0 0f b7 d2 4c 8d 4c 12 02 66 85 ed 74 1d <49> 8b 90 b8 00 00 00 49 8d 34 11 0f b7 0a 66 39 c8 0f 42 c1 48 83 RSP: 0018:ffffaaa501d23ba8 EFLAGS: 00010202 RAX: 0000000000000000 RBX: ffff8efdf600c940 RCX: 0000000000000000 RDX: 000000000000007f RSI: ffffffffacf429c4 RDI: ffff8efdf600ca78 RBP: 0000000000000080 R08: 0000000000000000 R09: 0000000000000100 R10: 0000000000000001 R11: ffffaaa501d238c0 R12: 0000000000000080 R13: 0000000000000000 R14: ffff8efdf600c000 R15: 0000000000000006 FS: 00007f977a7d2740(0000) GS:ffff8f041f840000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00000000000000b8 CR3: 00000002320aa004 CR4: 00000000003706f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <TASK> ? __die_body+0x15/0x60 ? page_fault_oops+0x157/0x440 ? do_user_addr_fault+0x60/0x770 ? _raw_spin_lock_irqsave+0x12/0x40 ? exc_page_fault+0x61/0x120 ? asm_exc_page_fault+0x22/0x30 ? bnxt_get_max_rss_ctx_ring+0x4c/0x90 [bnxt_en] ? bnxt_get_max_rss_ctx_ring+0x25/0x90 [bnxt_en] bnxt_set_channels+0x9d/0x340 [bnxt_en] ethtool_set_channels+0x14b/0x210 __dev_ethtool+0xdf8/0x2890 ? preempt_count_add+0x6a/0xa0 ? percpu_counter_add_batch+0x23/0x90 ? filemap_map_pages+0x417/0x4a0 ? avc_has_extended_perms+0x185/0x420 ? __pfx_udp_ioctl+0x10/0x10 ? sk_ioctl+0x55/0xf0 ? kmalloc_trace_noprof+0xe0/0x210 ? dev_ethtool+0x54/0x170 dev_ethtool+0xa2/0x170 dev_ioctl+0xbe/0x530 sock_do_ioctl+0xa3/0xf0 sock_ioctl+0x20d/0x2e0 bp->rss_ctx_list is not initialized if the chip or firmware does not support multiple RSS contexts. Fix it by adding a check in bnxt_get_max_rss_ctx_ring() before proceeding to reference bp->rss_ctx_list. Fixes: 0d1b7d6c9274 ("bnxt: fix crashes when reducing ring count with active RSS contexts") Reported-by: Breno Leitao <leitao@debian.org> Link: https://lore.kernel.org/netdev/ZpFEJeNpwxW1aW9k@gmail.com/ Reviewed-by: Andy Gospodarek <andrew.gospodarek@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Link: https://patch.msgid.link/20240712175318.166811-1-michael.chan@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
* bnxt: fix crashes when reducing ring count with active RSS contextsJakub Kicinski6 days3-0/+22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | bnxt doesn't check if a ring is used by RSS contexts when reducing ring count. Core performs a similar check for the drivers for the main context, but core doesn't know about additional contexts, so it can't validate them. bnxt_fill_hw_rss_tbl_p5() uses ring id to index bp->rx_ring[], which without the check may end up being out of bounds. BUG: KASAN: slab-out-of-bounds in __bnxt_hwrm_vnic_set_rss+0xb79/0xe40 Read of size 2 at addr ffff8881c5809618 by task ethtool/31525 Call Trace: __bnxt_hwrm_vnic_set_rss+0xb79/0xe40 bnxt_hwrm_vnic_rss_cfg_p5+0xf7/0x460 __bnxt_setup_vnic_p5+0x12e/0x270 __bnxt_open_nic+0x2262/0x2f30 bnxt_open_nic+0x5d/0xf0 ethnl_set_channels+0x5d4/0xb30 ethnl_default_set_doit+0x2f1/0x620 Core does track the additional contexts in net-next, so we can move this validation out of the driver as a follow up there. Fixes: b3d0083caf9a ("bnxt_en: Support RSS contexts in ethtool .{get|set}_rxfh()") Signed-off-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Link: https://patch.msgid.link/20240705020005.681746-1-kuba@kernel.org Signed-off-by: Paolo Abeni <pabeni@redhat.com>
* net: bcmasp: Fix error code in probe()Dan Carpenter10 days1-0/+1
| | | | | | | | | | | | Return an error code if bcmasp_interface_create() fails. Don't return success. Fixes: 490cb412007d ("net: bcmasp: Add support for ASP2.0 Ethernet controller") Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org> Reviewed-by: Michal Kubiak <michal.kubiak@intel.com> Reviewed-by: Justin Chen <justin.chen@broadcom.com> Link: https://patch.msgid.link/ZoWKBkHH9D1fqV4r@stanley.mountain Signed-off-by: Jakub Kicinski <kuba@kernel.org>
* bnxt_en: Fix the resource check condition for RSS contextsPavan Chebbi11 days1-1/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | While creating a new RSS context, bnxt_rfs_capable() currently makes a strict check to see if the required VNICs are already available. If the current VNICs are not what is required, either too many or not enough, it will call the firmware to reserve the exact number required. There is a bug in the firmware when the driver tries to relinquish some reserved VNICs and RSS contexts. It will cause the default VNIC to lose its RSS configuration and cause receive packets to be placed incorrectly. Workaround this problem by skipping the resource reduction. The driver will not reduce the VNIC and RSS context reservations when a context is deleted. The resources will be available for use when new contexts are created later. Potentially, this workaround can cause us to run out of VNIC and RSS contexts if there are a lot of VF functions creating and deleting RSS contexts. In the future, we will conditionally disable this workaround when the firmware fix is available. Fixes: 438ba39b25fe ("bnxt_en: Improve RSS context reservation infrastructure") Reported-by: Jakub Kicinski <kuba@kernel.org> Link: https://lore.kernel.org/netdev/20240625010210.2002310-1-kuba@kernel.org/ Reviewed-by: Andy Gospodarek <andrew.gospodarek@broadcom.com> Signed-off-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20240703180112.78590-1-michael.chan@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
* bnx2x: Fix multiple UBSAN array-index-out-of-boundsGhadi Elie Rahme2024-06-281-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Fix UBSAN warnings that occur when using a system with 32 physical cpu cores or more, or when the user defines a number of Ethernet queues greater than or equal to FP_SB_MAX_E1x using the num_queues module parameter. Currently there is a read/write out of bounds that occurs on the array "struct stats_query_entry query" present inside the "bnx2x_fw_stats_req" struct in "drivers/net/ethernet/broadcom/bnx2x/bnx2x.h". Looking at the definition of the "struct stats_query_entry query" array: struct stats_query_entry query[FP_SB_MAX_E1x+ BNX2X_FIRST_QUEUE_QUERY_IDX]; FP_SB_MAX_E1x is defined as the maximum number of fast path interrupts and has a value of 16, while BNX2X_FIRST_QUEUE_QUERY_IDX has a value of 3 meaning the array has a total size of 19. Since accesses to "struct stats_query_entry query" are offset-ted by BNX2X_FIRST_QUEUE_QUERY_IDX, that means that the total number of Ethernet queues should not exceed FP_SB_MAX_E1x (16). However one of these queues is reserved for FCOE and thus the number of Ethernet queues should be set to [FP_SB_MAX_E1x -1] (15) if FCOE is enabled or [FP_SB_MAX_E1x] (16) if it is not. This is also described in a comment in the source code in drivers/net/ethernet/broadcom/bnx2x/bnx2x.h just above the Macro definition of FP_SB_MAX_E1x. Below is the part of this explanation that it important for this patch /* * The total number of L2 queues, MSIX vectors and HW contexts (CIDs) is * control by the number of fast-path status blocks supported by the * device (HW/FW). Each fast-path status block (FP-SB) aka non-default * status block represents an independent interrupts context that can * serve a regular L2 networking queue. However special L2 queues such * as the FCoE queue do not require a FP-SB and other components like * the CNIC may consume FP-SB reducing the number of possible L2 queues * * If the maximum number of FP-SB available is X then: * a. If CNIC is supported it consumes 1 FP-SB thus the max number of * regular L2 queues is Y=X-1 * b. In MF mode the actual number of L2 queues is Y= (X-1/MF_factor) * c. If the FCoE L2 queue is supported the actual number of L2 queues * is Y+1 * d. The number of irqs (MSIX vectors) is either Y+1 (one extra for * slow-path interrupts) or Y+2 if CNIC is supported (one additional * FP interrupt context for the CNIC). * e. The number of HW context (CID count) is always X or X+1 if FCoE * L2 queue is supported. The cid for the FCoE L2 queue is always X. */ However this driver also supports NICs that use the E2 controller which can handle more queues due to having more FP-SB represented by FP_SB_MAX_E2. Looking at the commits when the E2 support was added, it was originally using the E1x parameters: commit f2e0899f0f27 ("bnx2x: Add 57712 support"). Back then FP_SB_MAX_E2 was set to 16 the same as E1x. However the driver was later updated to take full advantage of the E2 instead of having it be limited to the capabilities of the E1x. But as far as we can tell, the array "stats_query_entry query" was still limited to using the FP-SB available to the E1x cards as part of an oversignt when the driver was updated to take full advantage of the E2, and now with the driver being aware of the greater queue size supported by E2 NICs, it causes the UBSAN warnings seen in the stack traces below. This patch increases the size of the "stats_query_entry query" array by replacing FP_SB_MAX_E1x with FP_SB_MAX_E2 to be large enough to handle both types of NICs. Stack traces: UBSAN: array-index-out-of-bounds in drivers/net/ethernet/broadcom/bnx2x/bnx2x_stats.c:1529:11 index 20 is out of range for type 'stats_query_entry [19]' CPU: 12 PID: 858 Comm: systemd-network Not tainted 6.9.0-060900rc7-generic #202405052133 Hardware name: HP ProLiant DL360 Gen9/ProLiant DL360 Gen9, BIOS P89 10/21/2019 Call Trace: <TASK> dump_stack_lvl+0x76/0xa0 dump_stack+0x10/0x20 __ubsan_handle_out_of_bounds+0xcb/0x110 bnx2x_prep_fw_stats_req+0x2e1/0x310 [bnx2x] bnx2x_stats_init+0x156/0x320 [bnx2x] bnx2x_post_irq_nic_init+0x81/0x1a0 [bnx2x] bnx2x_nic_load+0x8e8/0x19e0 [bnx2x] bnx2x_open+0x16b/0x290 [bnx2x] __dev_open+0x10e/0x1d0 RIP: 0033:0x736223927a0a Code: d8 64 89 02 48 c7 c0 ff ff ff ff eb b8 0f 1f 00 f3 0f 1e fa 41 89 ca 64 8b 04 25 18 00 00 00 85 c0 75 15 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 7e c3 0f 1f 44 00 00 41 54 48 83 ec 30 44 89 RSP: 002b:00007ffc0bb2ada8 EFLAGS: 00000246 ORIG_RAX: 000000000000002c RAX: ffffffffffffffda RBX: 0000583df50f9c78 RCX: 0000736223927a0a RDX: 0000000000000020 RSI: 0000583df50ee510 RDI: 0000000000000003 RBP: 0000583df50d4940 R08: 00007ffc0bb2adb0 R09: 0000000000000080 R10: 0000000000000000 R11: 0000000000000246 R12: 0000583df5103ae0 R13: 000000000000035a R14: 0000583df50f9c30 R15: 0000583ddddddf00 </TASK> ---[ end trace ]--- ------------[ cut here ]------------ UBSAN: array-index-out-of-bounds in drivers/net/ethernet/broadcom/bnx2x/bnx2x_stats.c:1546:11 index 28 is out of range for type 'stats_query_entry [19]' CPU: 12 PID: 858 Comm: systemd-network Not tainted 6.9.0-060900rc7-generic #202405052133 Hardware name: HP ProLiant DL360 Gen9/ProLiant DL360 Gen9, BIOS P89 10/21/2019 Call Trace: <TASK> dump_stack_lvl+0x76/0xa0 dump_stack+0x10/0x20 __ubsan_handle_out_of_bounds+0xcb/0x110 bnx2x_prep_fw_stats_req+0x2fd/0x310 [bnx2x] bnx2x_stats_init+0x156/0x320 [bnx2x] bnx2x_post_irq_nic_init+0x81/0x1a0 [bnx2x] bnx2x_nic_load+0x8e8/0x19e0 [bnx2x] bnx2x_open+0x16b/0x290 [bnx2x] __dev_open+0x10e/0x1d0 RIP: 0033:0x736223927a0a Code: d8 64 89 02 48 c7 c0 ff ff ff ff eb b8 0f 1f 00 f3 0f 1e fa 41 89 ca 64 8b 04 25 18 00 00 00 85 c0 75 15 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 7e c3 0f 1f 44 00 00 41 54 48 83 ec 30 44 89 RSP: 002b:00007ffc0bb2ada8 EFLAGS: 00000246 ORIG_RAX: 000000000000002c RAX: ffffffffffffffda RBX: 0000583df50f9c78 RCX: 0000736223927a0a RDX: 0000000000000020 RSI: 0000583df50ee510 RDI: 0000000000000003 RBP: 0000583df50d4940 R08: 00007ffc0bb2adb0 R09: 0000000000000080 R10: 0000000000000000 R11: 0000000000000246 R12: 0000583df5103ae0 R13: 000000000000035a R14: 0000583df50f9c30 R15: 0000583ddddddf00 </TASK> ---[ end trace ]--- ------------[ cut here ]------------ UBSAN: array-index-out-of-bounds in drivers/net/ethernet/broadcom/bnx2x/bnx2x_sriov.c:1895:8 index 29 is out of range for type 'stats_query_entry [19]' CPU: 13 PID: 163 Comm: kworker/u96:1 Not tainted 6.9.0-060900rc7-generic #202405052133 Hardware name: HP ProLiant DL360 Gen9/ProLiant DL360 Gen9, BIOS P89 10/21/2019 Workqueue: bnx2x bnx2x_sp_task [bnx2x] Call Trace: <TASK> dump_stack_lvl+0x76/0xa0 dump_stack+0x10/0x20 __ubsan_handle_out_of_bounds+0xcb/0x110 bnx2x_iov_adjust_stats_req+0x3c4/0x3d0 [bnx2x] bnx2x_storm_stats_post.part.0+0x4a/0x330 [bnx2x] ? bnx2x_hw_stats_post+0x231/0x250 [bnx2x] bnx2x_stats_start+0x44/0x70 [bnx2x] bnx2x_stats_handle+0x149/0x350 [bnx2x] bnx2x_attn_int_asserted+0x998/0x9b0 [bnx2x] bnx2x_sp_task+0x491/0x5c0 [bnx2x] process_one_work+0x18d/0x3f0 </TASK> ---[ end trace ]--- Fixes: 50f0a562f8cc ("bnx2x: add fcoe statistics") Signed-off-by: Ghadi Elie Rahme <ghadi.rahme@canonical.com> Cc: stable@vger.kernel.org Link: https://patch.msgid.link/20240627111405.1037812-1-ghadi.rahme@canonical.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
* bnxt_en: Restore PTP tx_avail count in case of skb_pad() errorPavan Chebbi2024-06-201-3/+2
| | | | | | | | | | | | | | | | The current code only restores PTP tx_avail count when we get DMA mapping errors. Fix it so that the PTP tx_avail count will be restored for both DMA mapping errors and skb_pad() errors. Otherwise PTP TX timestamp will not be available after a PTP packet hits the skb_pad() error. Fixes: 83bb623c968e ("bnxt_en: Transmit and retrieve packet timestamps") Reviewed-by: Andy Gospodarek <andrew.gospodarek@broadcom.com> Signed-off-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/20240618215313.29631-4-michael.chan@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
* bnxt_en: Set TSO max segs on devices with limitsMichael Chan2024-06-202-0/+4
| | | | | | | | | | | | | | | | Firmware will now advertise a non-zero TSO max segments if the device has a limit. 0 means no limit. The latest 5760X chip (early revs) has a limit of 2047 that cannot be exceeded. If exceeded, the chip will send out just a small number of segments. Call netif_set_tso_max_segs() if the device has a limit. Fixes: 2012a6abc876 ("bnxt_en: Add 5760X (P7) PCI IDs") Reviewed-by: Ajit Khaparde <ajit.khaparde@broadcom.com> Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Link: https://lore.kernel.org/r/20240618215313.29631-3-michael.chan@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
* bnxt_en: Update firmware interface to 1.10.3.44Michael Chan2024-06-201-133/+178
| | | | | | | | | | The relevant change is the max_tso_segs value returned by firmware in the HWRM_FUNC_QCAPS response. This value will be used in the next patch to cap the TSO segments. Signed-off-by: Michael Chan <michael.chan@broadcom.com> Link: https://lore.kernel.org/r/20240618215313.29631-2-michael.chan@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
* bnxt_en: Adjust logging of firmware messages in case of released token in ↵Aleksandr Mishin2024-06-131-1/+1
| | | | | | | | | | | | | | | | | | | | | __hwrm_send() In case of token is released due to token->state == BNXT_HWRM_DEFERRED, released token (set to NULL) is used in log messages. This issue is expected to be prevented by HWRM_ERR_CODE_PF_UNAVAILABLE error code. But this error code is returned by recent firmware. So some firmware may not return it. This may lead to NULL pointer dereference. Adjust this issue by adding token pointer check. Found by Linux Verification Center (linuxtesting.org) with SVACE. Fixes: 8fa4219dba8e ("bnxt_en: add dynamic debug support for HWRM messages") Suggested-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: Aleksandr Mishin <amishin@t-argos.ru> Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com> Reviewed-by: Michael Chan <michael.chan@broadcom.com> Link: https://lore.kernel.org/r/20240611082547.12178-1-amishin@t-argos.ru Signed-off-by: Jakub Kicinski <kuba@kernel.org>
* bnxt_en: Cap the size of HWRM_PORT_PHY_QCFG forwarded responseMichael Chan2024-06-132-2/+61
| | | | | | | | | | | | | | | | | | | | | | Firmware interface 1.10.2.118 has increased the size of HWRM_PORT_PHY_QCFG response beyond the maximum size that can be forwarded. When the VF's link state is not the default auto state, the PF will need to forward the response back to the VF to indicate the forced state. This regression may cause the VF to fail to initialize. Fix it by capping the HWRM_PORT_PHY_QCFG response to the maximum 96 bytes. The SPEEDS2_SUPPORTED flag needs to be cleared because the new speeds2 fields are beyond the legacy structure. Also modify bnxt_hwrm_fwd_resp() to print a warning if the message size exceeds 96 bytes to make this failure more obvious. Fixes: 84a911db8305 ("bnxt_en: Update firmware interface to 1.10.2.118") Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com> Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Link: https://lore.kernel.org/r/20240612231736.57823-1-michael.chan@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
* bnxt_en: silence clang build warningVadim Fedorenko2024-05-101-1/+1
| | | | | | | | | | | | | | | | | | | Clang build brings a warning: ../drivers/net/ethernet/broadcom/bnxt/bnxt_ptp.c:133:12: warning: comparison of distinct pointer types ('typeof (tmo_us) *' (aka 'unsigned int *') and 'typeof (65535) *' (aka 'int *')) [-Wcompare-distinct-pointer-types] 133 | tmo_us = min(tmo_us, BNXT_PTP_QTS_MAX_TMO_US); | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Fix it by specifying proper type for BNXT_PTP_QTS_MAX_TMO_US. Fixes: 7de3c2218eed ("bnxt_en: Add a timeout parameter to bnxt_hwrm_port_ts_query()") Signed-off-by: Vadim Fedorenko <vadim.fedorenko@linux.dev> Reviewed-by: Michael Chan <michael.chan@broadcom.com> Link: https://lore.kernel.org/r/20240509151833.12579-1-vadim.fedorenko@linux.dev Signed-off-by: Jakub Kicinski <kuba@kernel.org>
* net: annotate writes on dev->mtu from ndo_change_mtu()Eric Dumazet2024-05-076-7/+7
| | | | | | | | | | | | | | | | | | | | | | | Simon reported that ndo_change_mtu() methods were never updated to use WRITE_ONCE(dev->mtu, new_mtu) as hinted in commit 501a90c94510 ("inet: protect against too small mtu values.") We read dev->mtu without holding RTNL in many places, with READ_ONCE() annotations. It is time to take care of ndo_change_mtu() methods to use corresponding WRITE_ONCE() Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Simon Horman <horms@kernel.org> Closes: https://lore.kernel.org/netdev/20240505144608.GB67882@kernel.org/ Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Reviewed-by: Sabrina Dubroca <sd@queasysnail.net> Reviewed-by: Simon Horman <horms@kernel.org> Acked-by: Shannon Nelson <shannon.nelson@amd.com> Link: https://lore.kernel.org/r/20240506102812.3025432-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
* bnxt: fix bnxt_get_avail_msix() returning negative valuesDavid Wei2024-05-032-13/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Current net-next/main does not boot for older chipsets e.g. Stratus. Sample dmesg: [ 11.368315] bnxt_en 0000:02:00.0 (unnamed net_device) (uninitialized): Able to reserve only 0 out of 9 requested RX rings [ 11.390181] bnxt_en 0000:02:00.0 (unnamed net_device) (uninitialized): Unable to reserve tx rings [ 11.438780] bnxt_en 0000:02:00.0 (unnamed net_device) (uninitialized): 2nd rings reservation failed. [ 11.487559] bnxt_en 0000:02:00.0 (unnamed net_device) (uninitialized): Not enough rings available. [ 11.506012] bnxt_en 0000:02:00.0: probe with driver bnxt_en failed with error -12 This is caused by bnxt_get_avail_msix() returning a negative value for these chipsets not using the new resource manager i.e. !BNXT_NEW_RM. This in turn causes hwr.cp in __bnxt_reserve_rings() to be set to 0. In the current call stack, __bnxt_reserve_rings() is called from bnxt_set_dflt_rings() before bnxt_init_int_mode(). Therefore, bp->total_irqs is always 0 and for !BNXT_NEW_RM bnxt_get_avail_msix() always returns a negative number. Historically, MSIX vectors were requested by the RoCE driver during run-time and bnxt_get_avail_msix() was used for this purpose. Today, RoCE MSIX vectors are statically allocated. bnxt_get_avail_msix() should only be called for the BNXT_NEW_RM() case to reserve the MSIX ahead of time for RoCE use. bnxt_get_avail_msix() is also be simplified to handle the BNXT_NEW_RM() case only. Fixes: d630624ebd70 ("bnxt_en: Utilize ulp client resources if RoCE is not registered") Signed-off-by: David Wei <dw@davidwei.uk> Reviewed-by: Michael Chan <michael.chan@broadcom.com> Link: https://lore.kernel.org/r/20240502203757.3761827-1-dw@davidwei.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>
* Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski2024-05-024-5/+29
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Cross-merge networking fixes after downstream PR. Conflicts: include/linux/filter.h kernel/bpf/core.c 66e13b615a0c ("bpf: verifier: prevent userspace memory access") d503a04f8bc0 ("bpf: Add support for certain atomics in bpf_arena to x86 JIT") https://lore.kernel.org/all/20240429114939.210328b0@canb.auug.org.au/ No adjacent changes. Signed-off-by: Jakub Kicinski <kuba@kernel.org>
| * net: bcmgenet: synchronize UMAC_CMD accessDoug Berger2024-04-294-3/+23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The UMAC_CMD register is written from different execution contexts and has insufficient synchronization protections to prevent possible corruption. Of particular concern are the acceses from the phy_device delayed work context used by the adjust_link call and the BH context that may be used by the ndo_set_rx_mode call. A spinlock is added to the driver to protect contended register accesses (i.e. reg_lock) and it is used to synchronize accesses to UMAC_CMD. Fixes: 1c1008c793fa ("net: bcmgenet: add main driver file") Cc: stable@vger.kernel.org Signed-off-by: Doug Berger <opendmb@gmail.com> Acked-by: Florian Fainelli <florian.fainelli@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * net: bcmgenet: synchronize use of bcmgenet_set_rx_mode()Doug Berger2024-04-291-1/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | The ndo_set_rx_mode function is synchronized with the netif_addr_lock spinlock and BHs disabled. Since this function is also invoked directly from the driver the same synchronization should be applied. Fixes: 72f96347628e ("net: bcmgenet: set Rx mode before starting netif") Cc: stable@vger.kernel.org Signed-off-by: Doug Berger <opendmb@gmail.com> Acked-by: Florian Fainelli <florian.fainelli@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * net: bcmgenet: synchronize EXT_RGMII_OOB_CTRL accessDoug Berger2024-04-291-1/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The EXT_RGMII_OOB_CTRL register can be written from different contexts. It is predominantly written from the adjust_link handler which is synchronized by the phydev->lock, but can also be written from a different context when configuring the mii in bcmgenet_mii_config(). The chances of contention are quite low, but it is conceivable that adjust_link could occur during resume when WoL is enabled so use the phydev->lock synchronizer in bcmgenet_mii_config() to be sure. Fixes: afe3f907d20f ("net: bcmgenet: power on MII block for all MII modes") Cc: stable@vger.kernel.org Signed-off-by: Doug Berger <opendmb@gmail.com> Acked-by: Florian Fainelli <florian.fainelli@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | bnxt_en: Add VF PCI ID for 5760X (P7) chipsAjit Khaparde2024-05-022-1/+4
| | | | | | | | | | | | | | | | | | | | | | | | No driver logic changes are required to support the VFs, so just add the VF PCI ID. Reviewed-by: Selvin Thyparampil Xavier <selvin.xavier@broadcom.com> Signed-off-by: Ajit Khaparde <ajit.khaparde@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/20240501003056.100607-7-michael.chan@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
* | bnxt_en: Optimize recovery path ULP locking in the driverKalesh AP2024-05-023-26/+44
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In the error recovery path (AER, firmware recovery, etc), the driver notifies the RoCE driver via ULP_STOP before the reset and via ULP_START after the reset, all under RTNL_LOCK. The RoCE driver can take a long time if there are a lot of QPs to destroy, so it is not ideal to hold the global RTNL lock. Rely on the new en_dev_lock mutex instead for ULP_STOP and ULP_START. For the most part, we move the ULP_STOP call before we take the RTNL lock and move the ULP_START after RTNL unlock. Note that SRIOV re-enablement must be done after ULP_START or RoCE on the VFs will not resume properly after reset. The one scenario in bnxt_hwrm_if_change() where the RTNL lock is already taken in the .ndo_open() context requires the ULP restart to be deferred to the bnxt_sp_task() workqueue. Reviewed-by: Selvin Thyparampil Xavier <selvin.xavier@broadcom.com> Reviewed-by: Vikas Gupta <vikas.gupta@broadcom.com> Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/20240501003056.100607-6-michael.chan@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
* | bnxt_en: Add a mutex to synchronize ULP operationsKalesh AP2024-05-022-1/+22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The current scheme relies heavily on the RTNL lock for all ULP operations between the L2 and the RoCE driver. Add a new en_dev_lock mutex so that the asynchronous ULP_STOP and ULP_START operations can be serialized with bnxt_register_dev() and bnxt_unregister_dev() calls without relying on the RTNL lock. The next patch will remove the RTNL lock from the ULP_STOP and ULP_START calls. Reviewed-by: Selvin Thyparampil Xavier <selvin.xavier@broadcom.com> Reviewed-by: Vikas Gupta <vikas.gupta@broadcom.com> Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/20240501003056.100607-5-michael.chan@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
* | bnxt_en: Don't call ULP_STOP/ULP_START during L2 resetMichael Chan2024-05-021-11/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There is no need to call ULP_STOP and ULP_START before and after the L2 reset in bnxt_reset_task(). This L2 reset is done after detecting TX timeout, RX ring errors, or VF config changes. The L2 reset does not affect RoCE since the firmware is not reset and the backing store is left alone. Reviewed-by: Andy Gospodarek <andrew.gospodarek@broadcom.com> Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/20240501003056.100607-4-michael.chan@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
* | bnxt_en: Don't support offline self test when RoCE driver is loadedKalesh AP2024-05-021-3/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Offline self test is a very disruptive operation for RoCE and requires all active QPs to be destroyed. With a large number of QPs, it can take a long time to destroy all the QPs and can timeout. Do not allow ethtool offline self test if the RoCE driver is registered on the device. Reviewed-by: Selvin Thyparampil Xavier <selvin.xavier@broadcom.com> Reviewed-by: Vikas Gupta <vikas.gupta@broadcom.com> Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/20240501003056.100607-3-michael.chan@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
* | bnxt_en: share NQ ring sw_stats memory with subringsEdwin Peer2024-05-023-19/+27
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | On P5_PLUS chips and later, the NQ rings have subrings for RX and TX completions respectively. These subrings are passed to the poll function instead of the base NQ, but each ring carries its own copy of the software ring statistics. For stats to be conveniently accessible in __bnxt_poll_work(), the statistics memory should either be shared between the NQ and its subrings or the subrings need to be included in the ethtool stats aggregation logic. This patch opts for the former, because it's more efficient and less confusing having the software statistics for a ring exist in a single place. Before this patch, the counter will not be displayed if the "wrong" cpr->sw_stats was used to increment a counter. Link: https://lore.kernel.org/netdev/CACKFLikEhVAJA+osD7UjQNotdGte+fth7zOy7yDdLkTyFk9Pyw@mail.gmail.com/ Signed-off-by: Edwin Peer <edwin.peer@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/20240501003056.100607-2-michael.chan@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
* | Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski2024-04-253-49/+68
|\| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Cross-merge networking fixes after downstream PR. Conflicts: drivers/net/ethernet/ti/icssg/icssg_prueth.c net/mac80211/chan.c 89884459a0b9 ("wifi: mac80211: fix idle calculation with multi-link") 87f5500285fb ("wifi: mac80211: simplify ieee80211_assign_link_chanctx()") https://lore.kernel.org/all/20240422105623.7b1fbda2@canb.auug.org.au/ net/unix/garbage.c 1971d13ffa84 ("af_unix: Suppress false-positive lockdep splat for spin_lock() in __unix_gc().") 4090fa373f0e ("af_unix: Replace garbage collection algorithm.") drivers/net/ethernet/ti/icssg/icssg_prueth.c drivers/net/ethernet/ti/icssg/icssg_common.c 4dcd0e83ea1d ("net: ti: icssg-prueth: Fix signedness bug in prueth_init_rx_chns()") e2dc7bfd677f ("net: ti: icssg-prueth: Move common functions into a separate file") No adjacent changes. Signed-off-by: Jakub Kicinski <kuba@kernel.org>
| * net: b44: set pause params only when interface is upPeter Münster2024-04-251-6/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | b44_free_rings() accesses b44::rx_buffers (and ::tx_buffers) unconditionally, but b44::rx_buffers is only valid when the device is up (they get allocated in b44_open(), and deallocated again in b44_close()), any other time these are just a NULL pointers. So if you try to change the pause params while the network interface is disabled/administratively down, everything explodes (which likely netifd tries to do). Link: https://github.com/openwrt/openwrt/issues/13789 Fixes: 1da177e4c3f4 (Linux-2.6.12-rc2) Cc: stable@vger.kernel.org Reported-by: Peter Münster <pm@a16n.net> Suggested-by: Jonas Gorski <jonas.gorski@gmail.com> Signed-off-by: Vaclav Svoboda <svoboda@neng.cz> Tested-by: Peter Münster <pm@a16n.net> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Peter Münster <pm@a16n.net> Reviewed-by: Michael Chan <michael.chan@broadcom.com> Link: https://lore.kernel.org/r/87y192oolj.fsf@a16n.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
| * eth: bnxt: fix counting packets discarded due to OOM and netpollJakub Kicinski2024-04-241-26/+18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | I added OOM and netpoll discard counters, naively assuming that the cpr pointer is pointing to a common completion ring. Turns out that is usually *a* completion ring but not *the* completion ring which bnapi->cp_ring points to. bnapi->cp_ring is where the stats are read from, so we end up reporting 0 thru ethtool -S and qstat even though the drop events have happened. Make 100% sure we're recording statistics in the correct structure. Fixes: 907fd4a294db ("bnxt: count discards due to memory allocation errors") Reviewed-by: Michael Chan <michael.chan@broadcom.com> Link: https://lore.kernel.org/r/20240424002148.3937059-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
| * bnxt_en: Fix error recovery for 5760X (P7) chipsMichael Chan2024-04-221-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | During error recovery, such as AER fatal error slot reset, we call bnxt_try_map_fw_health_reg() to try to get access to the health register to determine the firmware state. Fix bnxt_try_map_fw_health_reg() to recognize the P7 chip correctly and set up the health register. This fixes this type of AER slot reset failure: bnxt_en 0000:04:00.0: AER: PCIe Bus Error: severity=Uncorrectable (Fatal), type=Inaccessible, (Unregistered Agent ID) bnxt_en 0000:04:00.0 enp4s0f0np0: PCI I/O error detected bnxt_en 0000:04:00.0 bnxt_re0: Handle device suspend call bnxt_en 0000:04:00.1 enp4s0f1np1: PCI I/O error detected bnxt_en 0000:04:00.1 bnxt_re1: Handle device suspend call pcieport 0000:00:02.0: AER: Root Port link has been reset (0) bnxt_en 0000:04:00.0 enp4s0f0np0: PCI Slot Reset bnxt_en 0000:04:00.0: enabling device (0000 -> 0002) bnxt_en 0000:04:00.0: Firmware not ready bnxt_en 0000:04:00.1 enp4s0f1np1: PCI Slot Reset bnxt_en 0000:04:00.1: enabling device (0000 -> 0002) bnxt_en 0000:04:00.1: Firmware not ready pcieport 0000:00:02.0: AER: device recovery failed Fixes: a432a45bdba4 ("bnxt_en: Define basic P7 macros") Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * bnxt_en: Fix the PCI-AER routinesVikas Gupta2024-04-221-3/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | We do not support two simultaneous recoveries so check for reset flag, BNXT_STATE_IN_FW_RESET, and do not proceed with AER further. When the pci channel state is pci_channel_io_frozen, the PCIe link can not be trusted so we disable the traffic immediately and stop BAR access by calling bnxt_fw_fatal_close(). BAR access after AER fatal error can cause an NMI. Fixes: f75d9a0aa967 ("bnxt_en: Re-write PCI BARs after PCI fatal error.") Signed-off-by: Vikas Gupta <vikas.gupta@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * bnxt_en: refactor reset close codeVikas Gupta2024-04-221-6/+11
| | | | | | | | | | | | | | | | | | | | Introduce bnxt_fw_fatal_close() API which can be used to stop data path and disable device when firmware is in fatal state. Signed-off-by: Vikas Gupta <vikas.gupta@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * net: bcmasp: fix memory leak when bringing down interfaceJustin Chen2024-04-191-7/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When bringing down the TX rings we flush the rings but forget to reclaimed the flushed packets. This leads to a memory leak since we do not free the dma mapped buffers. This also leads to tx control block corruption when bringing down the interface for power management. Fixes: 490cb412007d ("net: bcmasp: Add support for ASP2.0 Ethernet controller") Signed-off-by: Justin Chen <justin.chen@broadcom.com> Acked-by: Florian Fainelli <florian.fainelli@broadcom.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/20240418180541.2271719-1-justin.chen@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
* | Merge branch '100GbE' of ↵Jakub Kicinski2024-04-241-2/+4
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue Tony Nguyen says: ==================== ice: Support 5 layer Tx scheduler topology Mateusz Polchlopek says: For performance reasons there is a need to have support for selectable Tx scheduler topology. Currently firmware supports only the default 9-layer and 5-layer topology. This patch series enables switch from default to 5-layer topology, if user decides to opt-in. * '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue: ice: Document tx_scheduling_layers parameter ice: Add tx_scheduling_layers devlink param ice: Enable switching default Tx scheduler topology ice: Adjust the VSI/Aggregator layers ice: Support 5 layer topology devlink: extend devlink_param *set pointer ==================== Link: https://lore.kernel.org/r/20240422203913.225151-1-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
| * | devlink: extend devlink_param *set pointerMateusz Polchlopek2024-04-221-2/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Extend devlink_param *set function pointer to take extack as a param. Sometimes it is needed to pass information to the end user from set function. It is more proper to use for that netlink instead of passing message to dmesg. Reviewed-by: Jiri Pirko <jiri@nvidia.com> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Signed-off-by: Mateusz Polchlopek <mateusz.polchlopek@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
* | | bnxt_en: flower: validate control flagsAsbjørn Sloth Tønnesen2024-04-241-0/+4
|/ / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This driver currently doesn't support any control flags. Use flow_rule_match_has_control_flags() to check for control flags, such as can be set through `tc flower ... ip_flags frag`. In case any control flags are masked, flow_rule_match_has_control_flags() sets a NL extended error message, and we return -EOPNOTSUPP. Only compile-tested. Signed-off-by: Asbjørn Sloth Tønnesen <ast@fiberby.net> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Reviewed-by: Sriharsha Basavapatna <sriharsha.basavapatna@broadcom.com> Tested-by: Sriharsha Basavapatna <sriharsha.basavapatna@broadcom.com> Link: https://lore.kernel.org/r/20240422152626.175569-1-ast@fiberby.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
* | Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski2024-04-112-1/+7
|\| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Cross-merge networking fixes after downstream PR. Conflicts: net/unix/garbage.c 47d8ac011fe1 ("af_unix: Fix garbage collector racing against connect()") 4090fa373f0e ("af_unix: Replace garbage collection algorithm.") Adjacent changes: drivers/net/ethernet/broadcom/bnxt/bnxt.c faa12ca24558 ("bnxt_en: Reset PTP tx_avail after possible firmware reset") b3d0083caf9a ("bnxt_en: Support RSS contexts in ethtool .{get|set}_rxfh()") drivers/net/ethernet/broadcom/bnxt/bnxt_ulp.c 7ac10c7d728d ("bnxt_en: Fix possible memory leak in bnxt_rdma_aux_device_init()") 194fad5b2781 ("bnxt_en: Refactor bnxt_rdma_aux_device_init/uninit functions") drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c 958f56e48385 ("net/mlx5e: Un-expose functions in en.h") 49e6c9387051 ("net/mlx5e: RSS, Block XOR hash with over 128 channels") Signed-off-by: Jakub Kicinski <kuba@kernel.org>
| * bnxt_en: Reset PTP tx_avail after possible firmware resetPavan Chebbi2024-04-081-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | It is possible that during error recovery and firmware reset, there is a pending TX PTP packet waiting for the timestamp. We need to reset this condition so that after recovery, the tx_avail count for PTP is reset back to the initial value. Otherwise, we may not accept any PTP TX timestamps after recovery. Fixes: 118612d519d8 ("bnxt_en: Add PTP clock APIs, ioctls, and ethtool methods") Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Signed-off-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * bnxt_en: Fix error recovery for RoCE ulp clientVikas Gupta2024-04-081-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | Since runtime MSIXs vector allocation/free has been removed, the L2 driver needs to repopulate the MSIX entries for the ulp client as the irq table may change during the recovery process. Fixes: 303432211324 ("bnxt_en: Remove runtime interrupt vector allocation") Reviewed-by: Andy Gospodarek <andrew.gospodarek@broadcom.com> Signed-off-by: Vikas Gupta <vikas.gupta@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * bnxt_en: Fix possible memory leak in bnxt_rdma_aux_device_init()Vikas Gupta2024-04-081-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | If ulp = kzalloc() fails, the allocated edev will leak because it is not properly assigned and the cleanup path will not be able to free it. Fix it by assigning it properly immediately after allocation. Fixes: 303432211324 ("bnxt_en: Remove runtime interrupt vector allocation") Reviewed-by: Andy Gospodarek <andrew.gospodarek@broadcom.com> Signed-off-by: Vikas Gupta <vikas.gupta@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | bnxt_en: Update MODULE_DESCRIPTIONMichael Chan2024-04-101-1/+1
| | | | | | | | | | | | | | | | | | | | | | Update MODULE_DESCRIPTION to the more generic adapter family name. The old name only includes the first generation of supported adapters. Signed-off-by: Michael Chan <michael.chan@broadcom.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Link: https://lore.kernel.org/r/20240409215431.41424-8-michael.chan@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
* | bnxt_en: Utilize ulp client resources if RoCE is not registeredVikas Gupta2024-04-103-9/+48
| | | | | | | | | | | | | | | | | | | | | | | | | | If the RoCE driver is not registered for a RoCE capable device, add flexibility to use the RoCE resources (MSIX/NQs) for L2 purposes, such as additional rings configured by the user or for XDP. Reviewed-by: Andy Gospodarek <andrew.gospodarek@broadcom.com> Signed-off-by: Vikas Gupta <vikas.gupta@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Link: https://lore.kernel.org/r/20240409215431.41424-7-michael.chan@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
* | bnxt_en: Change MSIX/NQs allocation policyVikas Gupta2024-04-104-19/+92
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The existing scheme sets aside a number of MSIX/NQs for the RoCE driver whether the RoCE driver is registered or not. This scheme is not flexible and limits the resources available for the L2 rings if RoCE is never used. Modify the scheme so that the RoCE MSIX/NQs can be used by the L2 driver if they are not used for RoCE. The MSIX/NQs are now represented by 3 fields. bp->ulp_num_msix_want contains the desired default value, edev->ulp_num_msix_vec contains the available value (but not necessarily in use), and ulp_tbl->msix_requested contains the actual value in use by RoCE. The L2 driver can dip into edev->ulp_num_msix_vec if necessary. We need to add rtnl_lock() back in bnxt_register_dev() and bnxt_unregister_dev() to synchronize the MSIX usage between L2 and RoCE. Reviewed-by: Andy Gospodarek <andrew.gospodarek@broadcom.com> Signed-off-by: Vikas Gupta <vikas.gupta@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Link: https://lore.kernel.org/r/20240409215431.41424-6-michael.chan@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
* | bnxt_en: Refactor bnxt_rdma_aux_device_init/uninit functionsVikas Gupta2024-04-103-10/+35
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In its current form, bnxt_rdma_aux_device_init() not only initializes the necessary data structures of the newly created aux device but also adds the aux device into the aux bus subsytem. Refactor the logic into separate functions, first function to initialize the aux device along with the required resources and second, to actually add the device to the aux bus subsytem. This separation helps to create bnxt_en_dev much earlier and save its resources separately. Reviewed-by: Andy Gospodarek <andrew.gospodarek@broadcom.com> Signed-off-by: Vikas Gupta <vikas.gupta@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Link: https://lore.kernel.org/r/20240409215431.41424-5-michael.chan@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
* | bnxt_en: Remove unneeded MSIX base structure fields and codeVikas Gupta2024-04-103-45/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Ever since commit: 303432211324 ("bnxt_en: Remove runtime interrupt vector allocation") The MSIX base vector is effectively always 0. Remove all unneeded structure fields and code referencing the MSIX base. Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com> Reviewed-by: Andy Gospodarek <andrew.gospodarek@broadcom.com> Signed-off-by: Vikas Gupta <vikas.gupta@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Link: https://lore.kernel.org/r/20240409215431.41424-4-michael.chan@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
* | bnxt_en: Remove a redundant NULL check in bnxt_register_dev()Kalesh AP2024-04-101-3/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The memory for "edev->ulp_tbl" is allocated inside the bnxt_rdma_aux_device_init() function. If it fails, the driver will not create the auxiliary device for RoCE. Hence the NULL check inside bnxt_register_dev() is unnecessary. Reviewed-by: Vikas Gupta <vikas.gupta@broadcom.com> Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com> Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Link: https://lore.kernel.org/r/20240409215431.41424-3-michael.chan@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
* | bnxt_en: Skip ethtool RSS context configuration in ifdown statePavan Chebbi2024-04-101-0/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The current implementation requires the ifstate to be up when configuring the RSS contexts. It will try to fetch the RX ring IDs and will crash if it is in ifdown state. Return error if !netif_running() to prevent the crash. An improved implementation is in the works to allow RSS contexts to be changed while in ifdown state. Fixes: b3d0083caf9a ("bnxt_en: Support RSS contexts in ethtool .{get|set}_rxfh()") Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com> Signed-off-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Link: https://lore.kernel.org/r/20240409215431.41424-2-michael.chan@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
* | ipv4: Set scope explicitly in ip_route_output().Guillaume Nault2024-04-081-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add a "scope" parameter to ip_route_output() so that callers don't have to override the tos parameter with the RTO_ONLINK flag if they want a local scope. This will allow converting flowi4_tos to dscp_t in the future, thus allowing static analysers to flag invalid interactions between "tos" (the DSCP bits) and ECN. Only three users ask for local scope (bonding, arp and atm). The others continue to use RT_SCOPE_UNIVERSE. While there, add a comment to warn users about the limitations of ip_route_output(). Signed-off-by: Guillaume Nault <gnault@redhat.com> Acked-by: Leon Romanovsky <leonro@nvidia.com> # infiniband Signed-off-by: David S. Miller <davem@davemloft.net>
* | bnxt_en: Fix PTP firmware timeout parameterMichael Chan2024-04-051-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | Use the correct tmo_us microsecond parameter for the PTP firmware timeout parameter. Fixes: 7de3c2218eed ("bnxt_en: Add a timeout parameter to bnxt_hwrm_port_ts_query()") Reported-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Reviewed-by: Andy Gospodarek <andrew.gospodarek@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Link: https://lore.kernel.org/r/20240404195500.171071-1-michael.chan@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
* | Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski2024-04-041-4/+12
|\| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Cross-merge networking fixes after downstream PR. Conflicts: net/ipv4/ip_gre.c 17af420545a7 ("erspan: make sure erspan_base_hdr is present in skb->head") 5832c4a77d69 ("ip_tunnel: convert __be16 tunnel flags to bitmaps") https://lore.kernel.org/all/20240402103253.3b54a1cf@canb.auug.org.au/ Adjacent changes: net/ipv6/ip6_fib.c d21d40605bca ("ipv6: Fix infinite recursion in fib6_dump_done().") 5fc68320c1fb ("ipv6: remove RTNL protection from inet6_dump_fib()") Signed-off-by: Jakub Kicinski <kuba@kernel.org>
| * Revert "tg3: Remove residual error handling in tg3_suspend"Paolo Abeni2024-04-041-4/+26
| | | | | | | | | | | | | | | | This reverts commit 9ab4ad295622a3481818856762471c1f8c830e18. I went out of coffee and applied it to the wrong tree. Blame on me. Signed-off-by: Paolo Abeni <pabeni@redhat.com>
| * tg3: Remove residual error handling in tg3_suspendNikita Kiryushin2024-04-041-26/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | As of now, tg3_power_down_prepare always ends with success, but the error handling code from former tg3_set_power_state call is still here. This code became unreachable in commit c866b7eac073 ("tg3: Do not use legacy PCI power management"). Remove (now unreachable) error handling code for simplification and change tg3_power_down_prepare to a void function as its result is no more checked. Signed-off-by: Nikita Kiryushin <kiryushin@ancud.ru> Reviewed-by: Michael Chan <michael.chan@broadcom.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/20240401191418.361747-1-kiryushin@ancud.ru Signed-off-by: Paolo Abeni <pabeni@redhat.com>
| * net: bcmgenet: Reset RBUF on first openPhil Elwell2024-04-031-4/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If the RBUF logic is not reset when the kernel starts then there may be some data left over from any network boot loader. If the 64-byte packet headers are enabled then this can be fatal. Extend bcmgenet_dma_disable to do perform the reset, but not when called from bcmgenet_resume in order to preserve a wake packet. N.B. This different handling of resume is just based on a hunch - why else wouldn't one reset the RBUF as well as the TBUF? If this isn't the case then it's easy to change the patch to make the RBUF reset unconditional. See: https://github.com/raspberrypi/linux/issues/3850 See: https://github.com/raspberrypi/firmware/issues/1882 Signed-off-by: Phil Elwell <phil@raspberrypi.com> Signed-off-by: Maarten Vanraes <maarten@rmail.be> Signed-off-by: David S. Miller <davem@davemloft.net>