summaryrefslogtreecommitdiffstats
path: root/drivers
Commit message (Collapse)AuthorAgeFilesLines
* mmc: moxart: fix handling of sgm->consumed, otherwise WARN_ON triggersSergei Antonov2024-04-251-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When e.g. 8 bytes are to be read, sgm->consumed equals 8 immediately after sg_miter_next() call. The driver then increments it as bytes are read, so sgm->consumed becomes 16 and this warning triggers in sg_miter_stop(): WARN_ON(miter->consumed > miter->length); WARNING: CPU: 0 PID: 28 at lib/scatterlist.c:925 sg_miter_stop+0x2c/0x10c CPU: 0 PID: 28 Comm: kworker/0:2 Tainted: G W 6.9.0-rc5-dirty #249 Hardware name: Generic DT based system Workqueue: events_freezable mmc_rescan Call trace:. unwind_backtrace from show_stack+0x10/0x14 show_stack from dump_stack_lvl+0x44/0x5c dump_stack_lvl from __warn+0x78/0x16c __warn from warn_slowpath_fmt+0xb0/0x160 warn_slowpath_fmt from sg_miter_stop+0x2c/0x10c sg_miter_stop from moxart_request+0xb0/0x468 moxart_request from mmc_start_request+0x94/0xa8 mmc_start_request from mmc_wait_for_req+0x60/0xa8 mmc_wait_for_req from mmc_app_send_scr+0xf8/0x150 mmc_app_send_scr from mmc_sd_setup_card+0x1c/0x420 mmc_sd_setup_card from mmc_sd_init_card+0x12c/0x4dc mmc_sd_init_card from mmc_attach_sd+0xf0/0x16c mmc_attach_sd from mmc_rescan+0x1e0/0x298 mmc_rescan from process_scheduled_works+0x2e4/0x4ec process_scheduled_works from worker_thread+0x1ec/0x24c worker_thread from kthread+0xd4/0xe0 kthread from ret_from_fork+0x14/0x38 This patch adds initial zeroing of sgm->consumed. It is then incremented as bytes are read or written. Signed-off-by: Sergei Antonov <saproj@gmail.com> Cc: Linus Walleij <linus.walleij@linaro.org> Fixes: 3ee0e7c3e67c ("mmc: moxart-mmc: Use sg_miter for PIO") Reviewed-by: Linus Walleij <linus.walleij@linaro.org> Link: https://lore.kernel.org/r/20240422153607.963672-1-saproj@gmail.com Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
* mmc: sdhci-of-dwcmshc: th1520: Increase tuning loop count to 128Maksim Kiselev2024-04-041-0/+1
| | | | | | | | | | | | | | | | | | Fix SD card tuning error by increasing tuning loop count from 40(MAX_TUNING_LOOP) to 128. For some reason the tuning algorithm requires to move through all the taps of delay line even if the THRESHOLD_MODE (bit 2 in AT_CTRL_R) is used instead of the LARGEST_WIN_MODE. Tested-by: Drew Fustini <drew@pdp7.com> Tested-by: Xi Ruoyao <xry111@xry111.site> Signed-off-by: Maksim Kiselev <bigunclemax@gmail.com> Acked-by: Adrian Hunter <adrian.hunter@intel.com> Fixes: 43658a542ebf ("mmc: sdhci-of-dwcmshc: Add support for T-Head TH1520") Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20240402093539.184287-1-bigunclemax@gmail.com Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
* mmc: sdhci-msm: pervent access to suspended controllerMantas Pucka2024-04-021-1/+15
| | | | | | | | | | | | | | Generic sdhci code registers LED device and uses host->runtime_suspended flag to protect access to it. The sdhci-msm driver doesn't set this flag, which causes a crash when LED is accessed while controller is runtime suspended. Fix this by setting the flag correctly. Cc: stable@vger.kernel.org Fixes: 67e6db113c90 ("mmc: sdhci-msm: Add pm_runtime and system PM support") Signed-off-by: Mantas Pucka <mantas@8devices.com> Acked-by: Adrian Hunter <adrian.hunter@intel.com> Link: https://lore.kernel.org/r/20240321-sdhci-mmc-suspend-v1-1-fbc555a64400@8devices.com Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
* Merge tag 'kbuild-fixes-v6.9' of ↵Linus Torvalds2024-03-3110-25/+12
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild Pull Kbuild fixes from Masahiro Yamada: - Deduplicate Kconfig entries for CONFIG_CXL_PMU - Fix unselectable choice entry in MIPS Kconfig, and forbid this structure - Remove unused include/asm-generic/export.h - Fix a NULL pointer dereference bug in modpost - Enable -Woverride-init warning consistently with W=1 - Drop KCSAN flags from *.mod.c files * tag 'kbuild-fixes-v6.9' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild: kconfig: Fix typo HEIGTH to HEIGHT Documentation/llvm: Note s390 LLVM=1 support with LLVM 18.1.0 and newer kbuild: Disable KCSAN for autogenerated *.mod.c intermediaries kbuild: make -Woverride-init warnings more consistent modpost: do not make find_tosym() return NULL export.h: remove include/asm-generic/export.h kconfig: do not reparent the menu inside a choice block MIPS: move unselectable FIT_IMAGE_FDT_EPM5 out of the "System type" choice cxl: remove CONFIG_CXL_PMU entry in drivers/cxl/Kconfig
| * kbuild: make -Woverride-init warnings more consistentArnd Bergmann2024-03-319-12/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The -Woverride-init warn about code that may be intentional or not, but the inintentional ones tend to be real bugs, so there is a bit of disagreement on whether this warning option should be enabled by default and we have multiple settings in scripts/Makefile.extrawarn as well as individual subsystems. Older versions of clang only supported -Wno-initializer-overrides with the same meaning as gcc's -Woverride-init, though all supported versions now work with both. Because of this difference, an earlier cleanup of mine accidentally turned the clang warning off for W=1 builds and only left it on for W=2, while it's still enabled for gcc with W=1. There is also one driver that only turns the warning off for newer versions of gcc but not other compilers, and some but not all the Makefiles still use a cc-disable-warning conditional that is no longer needed with supported compilers here. Address all of the above by removing the special cases for clang and always turning the warning off unconditionally where it got in the way, using the syntax that is supported by both compilers. Fixes: 2cd3271b7a31 ("kbuild: avoid duplicate warning options") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Hamza Mahfooz <hamza.mahfooz@amd.com> Acked-by: Jani Nikula <jani.nikula@intel.com> Acked-by: Andrew Jeffery <andrew@codeconstruct.com.au> Signed-off-by: Jani Nikula <jani.nikula@intel.com> Reviewed-by: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
| * cxl: remove CONFIG_CXL_PMU entry in drivers/cxl/KconfigMasahiro Yamada2024-03-271-13/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit 5d7107c72796 ("perf: CXL Performance Monitoring Unit driver") added the config entries for CXL_PMU in drivers/cxl/Kconfig and drivers/perf/Kconfig, so it can be toggled from multiple locations: [1] Device Drivers -> PCI support -> CXL (Compute Expres Link) Devices -> CXL Performance Monitoring Unit [2] Device Drivers -> Performance monitor support -> CXL Performance Monitoring Unit This complicates things, and nobody else does this. I kept the one in drivers/perf/Kconfig because CONFIG_CXL_PMU controls the compilation of drivers/perf/cxl_pmu.c. Acked-by: Davidlohr Bueso <dave@stgolabs.net> Reviewed-by: Dave Jiang <dave.jiang@intel.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
* | Merge tag 'edac_urgent_for_v6.9_rc2' of ↵Linus Torvalds2024-03-312-18/+43
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras Pull EDAC fixes from Borislav Petkov: - Fix more issues in the AMD FMPM driver * tag 'edac_urgent_for_v6.9_rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras: RAS: Avoid build errors when CONFIG_DEBUG_FS=n RAS/AMD/FMPM: Safely handle saved records of various sizes RAS/AMD/FMPM: Avoid NULL ptr deref in get_saved_records()
| * | RAS: Avoid build errors when CONFIG_DEBUG_FS=nYazen Ghannam2024-03-261-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | A new helper was introduced for RAS modules to be able to get the RAS subsystem debugfs root directory. The helper is defined in debugfs.c which is only built when CONFIG_DEBUG_FS=y. However, it's possible that the modules would include debugfs support for optional functionality. One current example is the fmpm module. In this case, a build error will occur when CONFIG_RAS_FMPM is selected and CONFIG_DEBUG_FS=n. Add an inline helper function stub for the CONFIG_DEBUG_FS=n case as the fmpm module can function without the debugfs functionality too. Fixes: 9d2b6fa09d15 ("RAS: Export helper to get ras_debugfs_dir") Closes: https://bugzilla.kernel.org/show_bug.cgi?id=218640 Reported-by: anthony s. knowles <akira.2020@protonmail.com> Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Tested-by: anthony s. knowles <akira.2020@protonmail.com> Link: https://lore.kernel.org/r/20240325183755.776-1-bp@alien8.de
| * | RAS/AMD/FMPM: Safely handle saved records of various sizesYazen Ghannam2024-03-251-18/+37
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently, the size of the locally cached FRU record structures is based on the module parameter "max_nr_entries". This creates issues when restoring records if a user changes the parameter. If the number of entries is reduced, then old, larger records will not be restored. The opportunity to take action on the saved data is missed. Also, new records will be created and written to storage, even as the old records remain in storage, resulting in wasted space. If the number of entries is increased, then the length of the old, smaller records will not be adjusted. This causes a checksum failure which leads to the old record being cleared from storage. Again this results in another missed opportunity for action on the saved data. Allocate the temporary record with the maximum possible size based on the current maximum number of supported entries (255). This allows the ERST read operation to succeed if max_nr_entries has been increased. Warn the user if a saved record exceeds the expected size and fail to load the module. This allows the user to adjust the module parameter without losing data or the opportunity to restore larger records. Increase the size of a saved record up to the current max_rec_len. The checksum will be recalculated, and the updated record will be written to storage. Fixes: 6f15e617cc99 ("RAS: Introduce a FRU memory poison manager") Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Tested-by: Muralidhara M K <muralidhara.mk@amd.com> Link: https://lore.kernel.org/r/20240319113322.280096-3-yazen.ghannam@amd.com
| * | RAS/AMD/FMPM: Avoid NULL ptr deref in get_saved_records()Yazen Ghannam2024-03-251-1/+3
| |/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | An old, invalid record should be cleared and skipped. Currently, the record is cleared in ERST, but it is not skipped. This leads to a NULL pointer dereference when attempting to copy the old record to the new record. Continue the loop after clearing an old, invalid record to skip it. Fixes: 6f15e617cc99 ("RAS: Introduce a FRU memory poison manager") Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Tested-by: Muralidhara M K <muralidhara.mk@amd.com> Link: https://lore.kernel.org/r/20240319113322.280096-2-yazen.ghannam@amd.com
* | Merge tag 'irq_urgent_for_v6.9_rc2' of ↵Linus Torvalds2024-03-312-2/+2
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull irq fixes from Borislav Petkov: - Fix an unused function warning on irqchip/irq-armada-370-xp - Fix the IRQ sharing with pinctrl-amd and ACPI OSL * tag 'irq_urgent_for_v6.9_rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: irqchip/armada-370-xp: Suppress unused-function warning genirq: Introduce IRQF_COND_ONESHOT and use it in pinctrl-amd
| * | irqchip/armada-370-xp: Suppress unused-function warningArnd Bergmann2024-03-261-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | armada_370_xp_msi_reenable_percpu() is only defined when CONFIG_PCI_MSI is enabled, and only called when SMP is enabled. Without CONFIG_SMP, there are no callers, which results in a build time warning instead: drivers/irqchip/irq-armada-370-xp.c:319:13: error: 'armada_370_xp_msi_reenable_percpu' defined but not used [-Werror=unused-function] 319 | static void armada_370_xp_msi_reenable_percpu(void) {} | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Mark the function as __maybe_unused to avoid adding more complexity to the #ifdefs. Fixes: 8ca61cde32c1 ("irqchip/armada-370-xp: Enable MSI affinity configuration") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://lore.kernel.org/r/20240322125838.901649-1-arnd@kernel.org
| * | genirq: Introduce IRQF_COND_ONESHOT and use it in pinctrl-amdRafael J. Wysocki2024-03-251-1/+1
| |/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There is a problem when a driver requests a shared interrupt line to run a threaded handler on it without IRQF_ONESHOT set if that flag has been set already for the IRQ in question by somebody else. Namely, the request fails which usually leads to a probe failure even though the driver might have worked just fine with IRQF_ONESHOT, but it does not want to use it by default. Currently, the only way to handle this is to try to request the IRQ without IRQF_ONESHOT, but with IRQF_PROBE_SHARED set and if this fails, try again with IRQF_ONESHOT set. However, this is a bit cumbersome and not very clean. When commit 7a36b901a6eb ("ACPI: OSL: Use a threaded interrupt handler for SCI") switched the ACPI subsystem over to using a threaded interrupt handler for the SCI, it had to use IRQF_ONESHOT for it because that's required due to the way the SCI handler works (it needs to walk all of the enabled GPEs before the interrupt line can be unmasked). The SCI interrupt line is not shared with other users very often due to the SCI handling overhead, but on sone systems it is shared and when the other user of it attempts to install a threaded handler, a flags mismatch related to IRQF_ONESHOT may occur. As it turned out, that happened to the pinctrl-amd driver and so commit 4451e8e8415e ("pinctrl: amd: Add IRQF_ONESHOT to the interrupt request") attempted to address the issue by adding IRQF_ONESHOT to the interrupt flags in that driver, but this is now causing an IRQF_ONESHOT-related mismatch to occur on another system which cannot boot as a result of it. Clearly, pinctrl-amd can work with IRQF_ONESHOT if need be, but it should not set that flag by default, so it needs a way to indicate that to the interrupt subsystem. To that end, introdcuce a new interrupt flag, IRQF_COND_ONESHOT, which will only have effect when the IRQ line is shared and IRQF_ONESHOT has been set for it already, in which case it will be promoted to the latter. This is sufficient for drivers sharing the interrupt line with the SCI as it is requested by the ACPI subsystem before any drivers are probed, so they will always see IRQF_ONESHOT set for the interrupt in question. Fixes: 4451e8e8415e ("pinctrl: amd: Add IRQF_ONESHOT to the interrupt request") Reported-by: Francisco Ayala Le Brun <francisco@videowindow.eu> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Linus Walleij <linus.walleij@linaro.org> Cc: 6.8+ <stable@vger.kernel.org> # 6.8+ Closes: https://lore.kernel.org/lkml/CAN-StX1HqWqi+YW=t+V52-38Mfp5fAz7YHx4aH-CQjgyNiKx3g@mail.gmail.com/ Link: https://lore.kernel.org/r/12417336.O9o76ZdvQC@kreacher
* | Merge tag 'scsi-fixes' of ↵Linus Torvalds2024-03-3042-354/+474
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi Pull SCSI fixes and updates from James Bottomley: "Fully half this pull is updates to lpfc and qla2xxx which got committed just as the merge window opened. A sizeable fraction of the driver updates are simple bug fixes (and lock reworks for bug fixes in the case of lpfc), so rather than splitting the few actual enhancements out, we're just adding the drivers to the -rc1 pull. The enhancements for lpfc are log message removals, copyright updates and three patches redefining types. For qla2xxx it's just removing a debug message on module removal and the manufacturer detail update. The two major fixes are the sg teardown race and a core error leg problem with the procfs directory not being removed if we destroy a created host that never got to the running state. The rest are minor fixes and constifications" * tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (41 commits) scsi: bnx2fc: Remove spin_lock_bh while releasing resources after upload scsi: core: Fix unremoved procfs host directory regression scsi: mpi3mr: Avoid memcpy field-spanning write WARNING scsi: sd: Fix TCG OPAL unlock on system resume scsi: sg: Avoid sg device teardown race scsi: lpfc: Copyright updates for 14.4.0.1 patches scsi: lpfc: Update lpfc version to 14.4.0.1 scsi: lpfc: Define types in a union for generic void *context3 ptr scsi: lpfc: Define lpfc_dmabuf type for ctx_buf ptr scsi: lpfc: Define lpfc_nodelist type for ctx_ndlp ptr scsi: lpfc: Use a dedicated lock for ras_fwlog state scsi: lpfc: Release hbalock before calling lpfc_worker_wake_up() scsi: lpfc: Replace hbalock with ndlp lock in lpfc_nvme_unregister_port() scsi: lpfc: Update lpfc_ramp_down_queue_handler() logic scsi: lpfc: Remove IRQF_ONESHOT flag from threaded IRQ handling scsi: lpfc: Move NPIV's transport unregistration to after resource clean up scsi: lpfc: Remove unnecessary log message in queuecommand path scsi: qla2xxx: Update version to 10.02.09.200-k scsi: qla2xxx: Delay I/O Abort on PCI error scsi: qla2xxx: Change debug message during driver unload ...
| * | scsi: bnx2fc: Remove spin_lock_bh while releasing resources after uploadSaurav Kashyap2024-03-251-2/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The session resources are used by FW and driver when session is offloaded, once session is uploaded these resources are not used. The lock is not required as these fields won't be used any longer. The offload and upload calls are sequential, hence lock is not required. This will suppress following BUG_ON(): [ 449.843143] ------------[ cut here ]------------ [ 449.848302] kernel BUG at mm/vmalloc.c:2727! [ 449.853072] invalid opcode: 0000 [#1] PREEMPT SMP PTI [ 449.858712] CPU: 5 PID: 1996 Comm: kworker/u24:2 Not tainted 5.14.0-118.el9.x86_64 #1 Rebooting. [ 449.867454] Hardware name: Dell Inc. PowerEdge R730/0WCJNT, BIOS 2.3.4 11/08/2016 [ 449.876966] Workqueue: fc_rport_eq fc_rport_work [libfc] [ 449.882910] RIP: 0010:vunmap+0x2e/0x30 [ 449.887098] Code: 00 65 8b 05 14 a2 f0 4a a9 00 ff ff 00 75 1b 55 48 89 fd e8 34 36 79 00 48 85 ed 74 0b 48 89 ef 31 f6 5d e9 14 fc ff ff 5d c3 <0f> 0b 0f 1f 44 00 00 41 57 41 56 49 89 ce 41 55 49 89 fd 41 54 41 [ 449.908054] RSP: 0018:ffffb83d878b3d68 EFLAGS: 00010206 [ 449.913887] RAX: 0000000080000201 RBX: ffff8f4355133550 RCX: 000000000d400005 [ 449.921843] RDX: 0000000000000001 RSI: 0000000000001000 RDI: ffffb83da53f5000 [ 449.929808] RBP: ffff8f4ac6675800 R08: ffffb83d878b3d30 R09: 00000000000efbdf [ 449.937774] R10: 0000000000000003 R11: ffff8f434573e000 R12: 0000000000001000 [ 449.945736] R13: 0000000000001000 R14: ffffb83da53f5000 R15: ffff8f43d4ea3ae0 [ 449.953701] FS: 0000000000000000(0000) GS:ffff8f529fc80000(0000) knlGS:0000000000000000 [ 449.962732] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 449.969138] CR2: 00007f8cf993e150 CR3: 0000000efbe10003 CR4: 00000000003706e0 [ 449.977102] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 449.985065] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 449.993028] Call Trace: [ 449.995756] __iommu_dma_free+0x96/0x100 [ 450.000139] bnx2fc_free_session_resc+0x67/0x240 [bnx2fc] [ 450.006171] bnx2fc_upload_session+0xce/0x100 [bnx2fc] [ 450.011910] bnx2fc_rport_event_handler+0x9f/0x240 [bnx2fc] [ 450.018136] fc_rport_work+0x103/0x5b0 [libfc] [ 450.023103] process_one_work+0x1e8/0x3c0 [ 450.027581] worker_thread+0x50/0x3b0 [ 450.031669] ? rescuer_thread+0x370/0x370 [ 450.036143] kthread+0x149/0x170 [ 450.039744] ? set_kthread_struct+0x40/0x40 [ 450.044411] ret_from_fork+0x22/0x30 [ 450.048404] Modules linked in: vfat msdos fat xfs nfs_layout_nfsv41_files rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver dm_service_time qedf qed crc8 bnx2fc libfcoe libfc scsi_transport_fc intel_rapl_msr intel_rapl_common x86_pkg_temp_thermal intel_powerclamp dcdbas rapl intel_cstate intel_uncore mei_me pcspkr mei ipmi_ssif lpc_ich ipmi_si fuse zram ext4 mbcache jbd2 loop nfsv3 nfs_acl nfs lockd grace fscache netfs irdma ice sd_mod t10_pi sg ib_uverbs ib_core 8021q garp mrp stp llc mgag200 i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt mxm_wmi fb_sys_fops cec crct10dif_pclmul ahci crc32_pclmul bnx2x drm ghash_clmulni_intel libahci rfkill i40e libata megaraid_sas mdio wmi sunrpc lrw dm_crypt dm_round_robin dm_multipath dm_snapshot dm_bufio dm_mirror dm_region_hash dm_log dm_zero dm_mod linear raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx raid6_pq libcrc32c crc32c_intel raid1 raid0 iscsi_ibft squashfs be2iscsi bnx2i cnic uio cxgb4i cxgb4 tls [ 450.048497] libcxgbi libcxgb qla4xxx iscsi_boot_sysfs iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi edd ipmi_devintf ipmi_msghandler [ 450.159753] ---[ end trace 712de2c57c64abc8 ]--- Reported-by: Guangwu Zhang <guazhang@redhat.com> Signed-off-by: Saurav Kashyap <skashyap@marvell.com> Signed-off-by: Nilesh Javali <njavali@marvell.com> Link: https://lore.kernel.org/r/20240315071427.31842-1-skashyap@marvell.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
| * | scsi: core: Fix unremoved procfs host directory regressionGuilherme G. Piccoli2024-03-251-3/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit fc663711b944 ("scsi: core: Remove the /proc/scsi/${proc_name} directory earlier") fixed a bug related to modules loading/unloading, by adding a call to scsi_proc_hostdir_rm() on scsi_remove_host(). But that led to a potential duplicate call to the hostdir_rm() routine, since it's also called from scsi_host_dev_release(). That triggered a regression report, which was then fixed by commit be03df3d4bfe ("scsi: core: Fix a procfs host directory removal regression"). The fix just dropped the hostdir_rm() call from dev_release(). But it happens that this proc directory is created on scsi_host_alloc(), and that function "pairs" with scsi_host_dev_release(), while scsi_remove_host() pairs with scsi_add_host(). In other words, it seems the reason for removing the proc directory on dev_release() was meant to cover cases in which a SCSI host structure was allocated, but the call to scsi_add_host() didn't happen. And that pattern happens to exist in some error paths, for example. Syzkaller causes that by using USB raw gadget device, error'ing on usb-storage driver, at usb_stor_probe2(). By checking that path, we can see that the BadDevice label leads to a scsi_host_put() after a SCSI host allocation, but there's no call to scsi_add_host() in such path. That leads to messages like this in dmesg (and a leak of the SCSI host proc structure): usb-storage 4-1:87.51: USB Mass Storage device detected proc_dir_entry 'scsi/usb-storage' already registered WARNING: CPU: 1 PID: 3519 at fs/proc/generic.c:377 proc_register+0x347/0x4e0 fs/proc/generic.c:376 The proper fix seems to still call scsi_proc_hostdir_rm() on dev_release(), but guard that with the state check for SHOST_CREATED; there is even a comment in scsi_host_dev_release() detailing that: such conditional is meant for cases where the SCSI host was allocated but there was no calls to {add,remove}_host(), like the usb-storage case. This is what we propose here and with that, the error path of usb-storage does not trigger the warning anymore. Reported-by: syzbot+c645abf505ed21f931b5@syzkaller.appspotmail.com Fixes: be03df3d4bfe ("scsi: core: Fix a procfs host directory removal regression") Cc: stable@vger.kernel.org Cc: Bart Van Assche <bvanassche@acm.org> Cc: John Garry <john.g.garry@oracle.com> Cc: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com> Signed-off-by: Guilherme G. Piccoli <gpiccoli@igalia.com> Link: https://lore.kernel.org/r/20240313113006.2834799-1-gpiccoli@igalia.com Reviewed-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
| * | scsi: mpi3mr: Avoid memcpy field-spanning write WARNINGShin'ichiro Kawasaki2024-03-251-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When the "storcli2 show" command is executed for eHBA-9600, mpi3mr driver prints this WARNING message: memcpy: detected field-spanning write (size 128) of single field "bsg_reply_buf->reply_buf" at drivers/scsi/mpi3mr/mpi3mr_app.c:1658 (size 1) WARNING: CPU: 0 PID: 12760 at drivers/scsi/mpi3mr/mpi3mr_app.c:1658 mpi3mr_bsg_request+0x6b12/0x7f10 [mpi3mr] The cause of the WARN is 128 bytes memcpy to the 1 byte size array "__u8 replay_buf[1]" in the struct mpi3mr_bsg_in_reply_buf. The array is intended to be a flexible length array, so the WARN is a false positive. To suppress the WARN, remove the constant number '1' from the array declaration and clarify that it has flexible length. Also, adjust the memory allocation size to match the change. Suggested-by: Sathya Prakash Veerichetty <sathya.prakash@broadcom.com> Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com> Link: https://lore.kernel.org/r/20240323084155.166835-1-shinichiro.kawasaki@wdc.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
| * | scsi: sd: Fix TCG OPAL unlock on system resumeDamien Le Moal2024-03-254-5/+66
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit 3cc2ffe5c16d ("scsi: sd: Differentiate system and runtime start/stop management") introduced the manage_system_start_stop scsi_device flag to allow libata to indicate to the SCSI disk driver that nothing should be done when resuming a disk on system resume. This change turned the execution of sd_resume() into a no-op for ATA devices on system resume. While this solved deadlock issues during device resume, this change also wrongly removed the execution of opal_unlock_from_suspend(). As a result, devices with TCG OPAL locking enabled remain locked and inaccessible after a system resume from sleep. To fix this issue, introduce the SCSI driver resume method and implement it with the sd_resume() function calling opal_unlock_from_suspend(). The former sd_resume() function is renamed to sd_resume_common() and modified to call the new sd_resume() function. For non-ATA devices, this result in no functional changes. In order for libata to explicitly execute sd_resume() when a device is resumed during system restart, the function scsi_resume_device() is introduced. libata calls this function from the revalidation work executed on devie resume, a state that is indicated with the new device flag ATA_DFLAG_RESUMING. Doing so, locked TCG OPAL enabled devices are unlocked on resume, allowing normal operation. Fixes: 3cc2ffe5c16d ("scsi: sd: Differentiate system and runtime start/stop management") Link: https://bugzilla.kernel.org/show_bug.cgi?id=218538 Cc: stable@vger.kernel.org Signed-off-by: Damien Le Moal <dlemoal@kernel.org> Link: https://lore.kernel.org/r/20240319071209.1179257-1-dlemoal@kernel.org Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
| * | scsi: sg: Avoid sg device teardown raceAlexander Wetzel2024-03-251-1/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | sg_remove_sfp_usercontext() must not use sg_device_destroy() after calling scsi_device_put(). sg_device_destroy() is accessing the parent scsi_device request_queue which will already be set to NULL when the preceding call to scsi_device_put() removed the last reference to the parent scsi_device. The resulting NULL pointer exception will then crash the kernel. Link: https://lore.kernel.org/r/20240305150509.23896-1-Alexander@wetzel-home.de Fixes: db59133e9279 ("scsi: sg: fix blktrace debugfs entries leakage") Cc: stable@vger.kernel.org Signed-off-by: Alexander Wetzel <Alexander@wetzel-home.de> Link: https://lore.kernel.org/r/20240320213032.18221-1-Alexander@wetzel-home.de Reviewed-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
| * | Merge branch '6.9/scsi-queue' into 6.9/scsi-fixesMartin K. Petersen2024-03-2535-342/+400
| |\ \ | | |/ | |/| | | | | | | | | | Pull in the outstanding updates from the 6.9/scsi-queue branch. Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
| | * Merge patch series "Update lpfc to revision 14.4.0.1"Martin K. Petersen2024-03-1016-185/+177
| | |\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Justin Tee <justintee8345@gmail.com> says: Update lpfc to revision 14.4.0.1 This patch set contains updates to log messaging, bug fixes related to unregistration, interrupt handling, resource recovery, and clean up patches regarding the abuse of hbalock and void pointers in the driver. The patches were cut against Martin's 6.9/scsi-queue tree. Link: https://lore.kernel.org/r/20240305200503.57317-1-justintee8345@gmail.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
| | | * scsi: lpfc: Copyright updates for 14.4.0.1 patchesJustin Tee2024-03-102-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Update copyrights to 2024 for files modified in the 14.4.0.1 patch set. Signed-off-by: Justin Tee <justin.tee@broadcom.com> Link: https://lore.kernel.org/r/20240305200503.57317-13-justintee8345@gmail.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
| | | * scsi: lpfc: Update lpfc version to 14.4.0.1Justin Tee2024-03-101-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Update lpfc version to 14.4.0.1 Signed-off-by: Justin Tee <justin.tee@broadcom.com> Link: https://lore.kernel.org/r/20240305200503.57317-12-justintee8345@gmail.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
| | | * scsi: lpfc: Define types in a union for generic void *context3 ptrJustin Tee2024-03-107-39/+51
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In LPFC_MBOXQ_t, the void *context3 ptr is used for various paths. It is treated as a generic pointer, and is type casted during its usage. The issue with this is that it can sometimes get confusing when reading code as to what the context3 ptr is being used for and mistakenly be reused in a different context. Rename context3 to ctx_u, and declare it as a union of defined ptr types. From now on, the ctx_u ptr may be used only if users define the use case type. Signed-off-by: Justin Tee <justin.tee@broadcom.com> Link: https://lore.kernel.org/r/20240305200503.57317-11-justintee8345@gmail.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
| | | * scsi: lpfc: Define lpfc_dmabuf type for ctx_buf ptrJustin Tee2024-03-109-54/+50
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In LPFC_MBOXQ_t, the ctx_buf ptr shouldn't be defined as a generic void *ptr. It is named ctx_buf and it should only be used as an lpfc_dmabuf *ptr. Due to the void* declaration, there have been abuses of ctx_buf for things not related to lpfc_dmabuf. So, set the ptr type for *ctx_buf as lpfc_dmabuf. Remove all type casts on ctx_buf because it is no longer a void *ptr. Convert the abuse of ctx_buf for something not related to lpfc_dmabuf to use the void *context3 ptr. A particular abuse of the ctx_buf warranted a new void *ext_buf ptr. However, the usage of this new void *ext_buf is not generic. It is intended to only hold virtual addresses for extended mailbox commands. Signed-off-by: Justin Tee <justin.tee@broadcom.com> Link: https://lore.kernel.org/r/20240305200503.57317-10-justintee8345@gmail.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
| | | * scsi: lpfc: Define lpfc_nodelist type for ctx_ndlp ptrJustin Tee2024-03-107-39/+36
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In LPFC_MBOXQ_t data structure, the ctx_ndlp ptr shouldn't be defined as a generic void *ptr. It is named ctx_ndlp and it should only be used as an lpfc_nodelist *ptr. Due to the void* declaration, there have been abuses of ctx_ndlp for things not related to ndlp. So, set the ptr type for *ctx_ndlp as lpfc_nodelist. Remove all type casts on ctx_ndlp because it is no longer a void *ptr. Convert the abuse of ctx_ndlp for things not related to ndlps to use the void *context3 ptr. Signed-off-by: Justin Tee <justin.tee@broadcom.com> Link: https://lore.kernel.org/r/20240305200503.57317-9-justintee8345@gmail.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
| | | * scsi: lpfc: Use a dedicated lock for ras_fwlog stateJustin Tee2024-03-106-28/+32
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | To reduce usage of and contention for hbalock, a separate dedicated lock is used to protect ras_fwlog state. Signed-off-by: Justin Tee <justin.tee@broadcom.com> Link: https://lore.kernel.org/r/20240305200503.57317-8-justintee8345@gmail.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
| | | * scsi: lpfc: Release hbalock before calling lpfc_worker_wake_up()Justin Tee2024-03-103-20/+19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | lpfc_worker_wake_up() calls the lpfc_work_done() routine, which takes the hbalock. Thus, lpfc_worker_wake_up() should not be called while holding the hbalock to avoid potential deadlock. Signed-off-by: Justin Tee <justin.tee@broadcom.com> Link: https://lore.kernel.org/r/20240305200503.57317-7-justintee8345@gmail.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
| | | * scsi: lpfc: Replace hbalock with ndlp lock in lpfc_nvme_unregister_port()Justin Tee2024-03-101-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The ndlp object update in lpfc_nvme_unregister_port() should be protected by the ndlp lock rather than hbalock. Signed-off-by: Justin Tee <justin.tee@broadcom.com> Link: https://lore.kernel.org/r/20240305200503.57317-6-justintee8345@gmail.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
| | | * scsi: lpfc: Update lpfc_ramp_down_queue_handler() logicJustin Tee2024-03-102-10/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Typically when an out of resource CQE status is detected, the lpfc_ramp_down_queue_handler() logic is called to help reduce I/O load by reducing an sdev's queue_depth. However, the current lpfc_rampdown_queue_depth() logic does not help reduce queue_depth. num_cmd_success is never updated and is always zero, which means new_queue_depth will always be set to sdev->queue_depth. So, new_queue_depth = sdev->queue_depth - new_queue_depth always sets new_queue_depth to zero. And, scsi_change_queue_depth(sdev, 0) is essentially a no-op. Change the lpfc_ramp_down_queue_handler() logic to set new_queue_depth equal to sdev->queue_depth subtracted from number of times num_rsrc_err was incremented. If num_rsrc_err is >= sdev->queue_depth, then set new_queue_depth equal to 1. Eventually, the frequency of Good_Status frames will signal SCSI upper layer to auto increase the queue_depth back to the driver default of 64 via scsi_handle_queue_ramp_up(). Signed-off-by: Justin Tee <justin.tee@broadcom.com> Link: https://lore.kernel.org/r/20240305200503.57317-5-justintee8345@gmail.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
| | | * scsi: lpfc: Remove IRQF_ONESHOT flag from threaded IRQ handlingJustin Tee2024-03-101-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | IRQF_ONESHOT is found to mask HBA generated interrupts when thread_fn is running. As a result, some EQEs/CQEs miss timely processing resulting in SCSI layer attempts to abort commands due to io_timeout. Abort CQEs are also not processed leading to the observations of hangs and spam of "0748 abort handler timed out waiting for aborting I/O" log messages. Remove the IRQF_ONESHOT flag. The cmpxchg and xchg atomic operations on lpfc_queue->queue_claimed already protect potential parallel access to an EQ/CQ should the thread_fn get interrupted by the primary irq handler. Signed-off-by: Justin Tee <justin.tee@broadcom.com> Link: https://lore.kernel.org/r/20240305200503.57317-4-justintee8345@gmail.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
| | | * scsi: lpfc: Move NPIV's transport unregistration to after resource clean upJustin Tee2024-03-101-4/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There are cases after NPIV deletion where the fabric switch still believes the NPIV is logged into the fabric. This occurs when a vport is unregistered before the Remove All DA_ID CT and LOGO ELS are sent to the fabric. Currently fc_remove_host(), which calls dev_loss_tmo for all D_IDs including the fabric D_ID, removes the last ndlp reference and frees the ndlp rport object. This sometimes causes the race condition where the final DA_ID and LOGO are skipped from being sent to the fabric switch. Fix by moving the fc_remove_host() and scsi_remove_host() calls after DA_ID and LOGO are sent. Signed-off-by: Justin Tee <justin.tee@broadcom.com> Link: https://lore.kernel.org/r/20240305200503.57317-3-justintee8345@gmail.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
| | | * scsi: lpfc: Remove unnecessary log message in queuecommand pathJustin Tee2024-03-101-10/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Message 9038 logs when LLDD receives SCSI_PROT_NORMAL when T10 DIF protection is configured. The event is not wrong, but the log message has not proven useful in debugging so it is removed. Signed-off-by: Justin Tee <justin.tee@broadcom.com> Link: https://lore.kernel.org/r/20240305200503.57317-2-justintee8345@gmail.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
| | * | Merge patch series "qla2xxx misc. bug fixes"Martin K. Petersen2024-03-109-95/+138
| | |\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Nilesh Javali <njavali@marvell.com> says: Please apply the qla2xxx driver miscellaneous bug fixes to the scsi tree at your earliest convenience. Link: https://lore.kernel.org/r/20240227164127.36465-1-njavali@marvell.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
| | | * | scsi: qla2xxx: Update version to 10.02.09.200-kNilesh Javali2024-03-101-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Signed-off-by: Nilesh Javali <njavali@marvell.com> Link: https://lore.kernel.org/r/20240227164127.36465-12-njavali@marvell.com Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
| | | * | scsi: qla2xxx: Delay I/O Abort on PCI errorQuinn Tran2024-03-101-2/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently when PCI error is detected, I/O is aborted manually through the ABORT IOCB mechanism which is not guaranteed to succeed. Instead, wait for the OS or system to notify driver to wind down I/O through the pci_error_handlers api. Set eeh_busy flag to pause all traffic and wait for I/O to drain. Cc: stable@vger.kernel.org Signed-off-by: Quinn Tran <qutran@marvell.com> Signed-off-by: Nilesh Javali <njavali@marvell.com> Link: https://lore.kernel.org/r/20240227164127.36465-11-njavali@marvell.com Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
| | | * | scsi: qla2xxx: Change debug message during driver unloadSaurav Kashyap2024-03-101-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Upon driver unload, purge_mbox flag is set and the heartbeat monitor thread detects this flag and does not send the mailbox command down to FW with a debug message "Error detected: purge[1] eeh[0] cmd=0x0, Exiting". This being not a real error, change the debug message. Cc: stable@vger.kernel.org Signed-off-by: Saurav Kashyap <skashyap@marvell.com> Signed-off-by: Nilesh Javali <njavali@marvell.com> Link: https://lore.kernel.org/r/20240227164127.36465-10-njavali@marvell.com Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
| | | * | scsi: qla2xxx: Fix double free of fcportSaurav Kashyap2024-03-101-2/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The server was crashing after LOGO because fcport was getting freed twice. -----------[ cut here ]----------- kernel BUG at mm/slub.c:371! invalid opcode: 0000 1 SMP PTI CPU: 35 PID: 4610 Comm: bash Kdump: loaded Tainted: G OE --------- - - 4.18.0-425.3.1.el8.x86_64 #1 Hardware name: HPE ProLiant DL360 Gen10/ProLiant DL360 Gen10, BIOS U32 09/03/2021 RIP: 0010:set_freepointer.part.57+0x0/0x10 RSP: 0018:ffffb07107027d90 EFLAGS: 00010246 RAX: ffff9cb7e3150000 RBX: ffff9cb7e332b9c0 RCX: ffff9cb7e3150400 RDX: 0000000000001f37 RSI: 0000000000000000 RDI: ffff9cb7c0005500 RBP: fffff693448c5400 R08: 0000000080000000 R09: 0000000000000009 R10: 0000000000000000 R11: 0000000000132af0 R12: ffff9cb7c0005500 R13: ffff9cb7e3150000 R14: ffffffffc06990e0 R15: ffff9cb7ea85ea58 FS: 00007ff6b79c2740(0000) GS:ffff9cb8f7ec0000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 000055b426b7d700 CR3: 0000000169c18002 CR4: 00000000007706e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 PKRU: 55555554 Call Trace: kfree+0x238/0x250 qla2x00_els_dcmd_sp_free+0x20/0x230 [qla2xxx] ? qla24xx_els_dcmd_iocb+0x607/0x690 [qla2xxx] qla2x00_issue_logo+0x28c/0x2a0 [qla2xxx] ? qla2x00_issue_logo+0x28c/0x2a0 [qla2xxx] ? kernfs_fop_write+0x11e/0x1a0 Remove one of the free calls and add check for valid fcport. Also use function qla2x00_free_fcport() instead of kfree(). Cc: stable@vger.kernel.org Signed-off-by: Saurav Kashyap <skashyap@marvell.com> Signed-off-by: Nilesh Javali <njavali@marvell.com> Link: https://lore.kernel.org/r/20240227164127.36465-9-njavali@marvell.com Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
| | | * | scsi: qla2xxx: Fix double free of the ha->vp_map pointerSaurav Kashyap2024-03-101-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Coverity scan reported potential risk of double free of the pointer ha->vp_map. ha->vp_map was freed in qla2x00_mem_alloc(), and again freed in function qla2x00_mem_free(ha). Assign NULL to vp_map and kfree take care of NULL. Cc: stable@vger.kernel.org Signed-off-by: Saurav Kashyap <skashyap@marvell.com> Signed-off-by: Nilesh Javali <njavali@marvell.com> Link: https://lore.kernel.org/r/20240227164127.36465-8-njavali@marvell.com Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
| | | * | scsi: qla2xxx: Fix command flush on cable pullQuinn Tran2024-03-101-0/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | System crash due to command failed to flush back to SCSI layer. BUG: unable to handle kernel NULL pointer dereference at 0000000000000000 PGD 0 P4D 0 Oops: 0000 [#1] SMP NOPTI CPU: 27 PID: 793455 Comm: kworker/u130:6 Kdump: loaded Tainted: G OE --------- - - 4.18.0-372.9.1.el8.x86_64 #1 Hardware name: HPE ProLiant DL360 Gen10/ProLiant DL360 Gen10, BIOS U32 09/03/2021 Workqueue: nvme-wq nvme_fc_connect_ctrl_work [nvme_fc] RIP: 0010:__wake_up_common+0x4c/0x190 Code: 24 10 4d 85 c9 74 0a 41 f6 01 04 0f 85 9d 00 00 00 48 8b 43 08 48 83 c3 08 4c 8d 48 e8 49 8d 41 18 48 39 c3 0f 84 f0 00 00 00 <49> 8b 41 18 89 54 24 08 31 ed 4c 8d 70 e8 45 8b 29 41 f6 c5 04 75 RSP: 0018:ffff95f3e0cb7cd0 EFLAGS: 00010086 RAX: 0000000000000000 RBX: ffff8b08d3b26328 RCX: 0000000000000000 RDX: 0000000000000001 RSI: 0000000000000003 RDI: ffff8b08d3b26320 RBP: 0000000000000001 R08: 0000000000000000 R09: ffffffffffffffe8 R10: 0000000000000000 R11: ffff95f3e0cb7a60 R12: ffff95f3e0cb7d20 R13: 0000000000000003 R14: 0000000000000000 R15: 0000000000000000 FS: 0000000000000000(0000) GS:ffff8b2fdf6c0000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000000 CR3: 0000002f1e410002 CR4: 00000000007706e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 PKRU: 55555554 Call Trace: __wake_up_common_lock+0x7c/0xc0 qla_nvme_ls_req+0x355/0x4c0 [qla2xxx] qla2xxx [0000:12:00.1]-f084:3: qlt_free_session_done: se_sess 0000000000000000 / sess ffff8ae1407ca000 from port 21:32:00:02:ac:07:ee:b8 loop_id 0x02 s_id 01:02:00 logout 1 keep 0 els_logo 0 ? __nvme_fc_send_ls_req+0x260/0x380 [nvme_fc] qla2xxx [0000:12:00.1]-207d:3: FCPort 21:32:00:02:ac:07:ee:b8 state transitioned from ONLINE to LOST - portid=010200. ? nvme_fc_send_ls_req.constprop.42+0x1a/0x45 [nvme_fc] qla2xxx [0000:12:00.1]-2109:3: qla2x00_schedule_rport_del 21320002ac07eeb8. rport ffff8ae598122000 roles 1 ? nvme_fc_connect_ctrl_work.cold.63+0x1e3/0xa7d [nvme_fc] qla2xxx [0000:12:00.1]-f084:3: qlt_free_session_done: se_sess 0000000000000000 / sess ffff8ae14801e000 from port 21:32:01:02:ad:f7:ee:b8 loop_id 0x04 s_id 01:02:01 logout 1 keep 0 els_logo 0 ? __switch_to+0x10c/0x450 ? process_one_work+0x1a7/0x360 qla2xxx [0000:12:00.1]-207d:3: FCPort 21:32:01:02:ad:f7:ee:b8 state transitioned from ONLINE to LOST - portid=010201. ? worker_thread+0x1ce/0x390 ? create_worker+0x1a0/0x1a0 qla2xxx [0000:12:00.1]-2109:3: qla2x00_schedule_rport_del 21320102adf7eeb8. rport ffff8ae3b2312800 roles 70 ? kthread+0x10a/0x120 qla2xxx [0000:12:00.1]-2112:3: qla_nvme_unregister_remote_port: unregister remoteport on ffff8ae14801e000 21320102adf7eeb8 ? set_kthread_struct+0x40/0x40 qla2xxx [0000:12:00.1]-2110:3: remoteport_delete of ffff8ae14801e000 21320102adf7eeb8 completed. ? ret_from_fork+0x1f/0x40 qla2xxx [0000:12:00.1]-f086:3: qlt_free_session_done: waiting for sess ffff8ae14801e000 logout The system was under memory stress where driver was not able to allocate an SRB to carry out error recovery of cable pull. The failure to flush causes upper layer to start modifying scsi_cmnd. When the system frees up some memory, the subsequent cable pull trigger another command flush. At this point the driver access a null pointer when attempting to DMA unmap the SGL. Add a check to make sure commands are flush back on session tear down to prevent the null pointer access. Cc: stable@vger.kernel.org Signed-off-by: Quinn Tran <qutran@marvell.com> Signed-off-by: Nilesh Javali <njavali@marvell.com> Link: https://lore.kernel.org/r/20240227164127.36465-7-njavali@marvell.com Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
| | | * | scsi: qla2xxx: NVME|FCP prefer flag not being honoredQuinn Tran2024-03-101-0/+18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Changing of [FCP|NVME] prefer flag in flash has no effect on driver. For device that supports both FCP + NVMe over the same connection, driver continues to connect to this device using the previous successful login mode. On completion of flash update, adapter will be reset. Driver will reset the prefer flag based on setting from flash. Cc: stable@vger.kernel.org Signed-off-by: Quinn Tran <qutran@marvell.com> Signed-off-by: Nilesh Javali <njavali@marvell.com> Link: https://lore.kernel.org/r/20240227164127.36465-6-njavali@marvell.com Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
| | | * | scsi: qla2xxx: Update manufacturer detailBikash Hazarika2024-03-101-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Update manufacturer detail from "Marvell Semiconductor, Inc." to "Marvell". Cc: stable@vger.kernel.org Signed-off-by: Bikash Hazarika <bhazarika@marvell.com> Signed-off-by: Nilesh Javali <njavali@marvell.com> Link: https://lore.kernel.org/r/20240227164127.36465-5-njavali@marvell.com Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
| | | * | scsi: qla2xxx: Split FCE|EFT trace controlQuinn Tran2024-03-101-61/+41
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Current code combines the allocation of FCE|EFT trace buffers and enables the features all in 1 step. Split this step into separate steps in preparation for follow-on patch to allow user to have a choice to enable / disable FCE trace feature. Cc: stable@vger.kernel.org Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Quinn Tran <qutran@marvell.com> Signed-off-by: Nilesh Javali <njavali@marvell.com> Link: https://lore.kernel.org/r/20240227164127.36465-4-njavali@marvell.com Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
| | | * | scsi: qla2xxx: Fix N2N stuck connectionQuinn Tran2024-03-103-23/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Disk failed to rediscover after chip reset error injection. The chip reset happens at the time when a PLOGI is being sent. This causes a flag to be left on which blocks the retry. Clear the blocking flag. Cc: stable@vger.kernel.org Signed-off-by: Quinn Tran <qutran@marvell.com> Signed-off-by: Nilesh Javali <njavali@marvell.com> Link: https://lore.kernel.org/r/20240227164127.36465-3-njavali@marvell.com Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
| | | * | scsi: qla2xxx: Prevent command send on chip resetQuinn Tran2024-03-102-4/+37
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently IOCBs are allowed to push through while chip reset could be in progress. During chip reset the outstanding_cmds array is cleared twice. Once when any command on this array is returned as failed and secondly when the array is initialize to zero. If a command is inserted on to the array between these intervals, then the command will be lost. Check for chip reset before sending IOCB. Cc: stable@vger.kernel.org Signed-off-by: Quinn Tran <qutran@marvell.com> Signed-off-by: Nilesh Javali <njavali@marvell.com> Link: https://lore.kernel.org/r/20240227164127.36465-2-njavali@marvell.com Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
| | * | | scsi: lpfc: Correct size for cmdwqe/rspwqe for memset()Muhammad Usama Anjum2024-03-101-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The cmdwqe and rspwqe are of type lpfc_wqe128. They should be memset() with the same type. Fixes: 61910d6a5243 ("scsi: lpfc: SLI path split: Refactor CT paths") Signed-off-by: Muhammad Usama Anjum <usama.anjum@collabora.com> Link: https://lore.kernel.org/r/20240304091119.847060-1-usama.anjum@collabora.com Reviewed-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
| | * | | scsi: lpfc: Correct size for wqe for memset()Muhammad Usama Anjum2024-03-101-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The wqe is of type lpfc_wqe128. It should be memset with the same type. Fixes: 6c621a2229b0 ("scsi: lpfc: Separate NVMET RQ buffer posting from IO resources SGL/iocbq/context") Signed-off-by: Muhammad Usama Anjum <usama.anjum@collabora.com> Link: https://lore.kernel.org/r/20240304090649.833953-1-usama.anjum@collabora.com Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com> Reviewed-by: Justin Tee <justintee8345@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
| | * | | scsi: st: Make st_sysfs_class constantRicardo B. Marliere2024-03-101-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Since commit 43a7206b0963 ("driver core: class: make class_register() take a const *"), the driver core allows for struct class to be in read-only memory, so move the st_sysfs_class structure to be declared at build time placing it into read-only memory, instead of having to be dynamically allocated at boot time. Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Suggested-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Ricardo B. Marliere <ricardo@marliere.net> Link: https://lore.kernel.org/r/20240302-class_cleanup-scsi-v1-5-b9096b990e27@marliere.net Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
| | * | | scsi: ch: Make ch_sysfs_class constantRicardo B. Marliere2024-03-101-10/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Since commit 43a7206b0963 ("driver core: class: make class_register() take a const *"), the driver core allows for struct class to be in read-only memory, so move the ch_sysfs_class structure to be declared at build time placing it into read-only memory, instead of having to be dynamically allocated at boot time. Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Suggested-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Ricardo B. Marliere <ricardo@marliere.net> Link: https://lore.kernel.org/r/20240302-class_cleanup-scsi-v1-4-b9096b990e27@marliere.net Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
| | * | | scsi: cxlflash: Make cxlflash_class constantRicardo B. Marliere2024-03-101-7/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Since commit 43a7206b0963 ("driver core: class: make class_register() take a const *"), the driver core allows for struct class to be in read-only memory, so move the cxlflash_class structure to be declared at build time placing it into read-only memory, instead of having to be dynamically allocated at boot time. Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Suggested-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Ricardo B. Marliere <ricardo@marliere.net> Link: https://lore.kernel.org/r/20240302-class_cleanup-scsi-v1-3-b9096b990e27@marliere.net Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>