summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* Linux 6.7.12v6.7.12linux-6.7.yGreg Kroah-Hartman2024-04-031-1/+1
| | | | | | | | | | | | | Link: https://lore.kernel.org/r/20240401152553.125349965@linuxfoundation.org Tested-by: SeongJae Park <sj@kernel.org> Tested-by: Florian Fainelli <florian.fainelli@broadcom.com> Tested-by: Shuah Khan <skhan@linuxfoundation.org> Tested-by: Ron Economos <re@w6rz.net> Tested-by: Bagas Sanjaya <bagasdotme@gmail.com> Tested-by: Justin M. Forbes <jforbes@fedoraproject.org> Tested-by: Mark Brown <broonie@kernel.org> Tested-by: Jon Hunter <jonathanh@nvidia.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* drm/sched: fix null-ptr-deref in init entityVitaly Prosyak2024-04-031-3/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit f34e8bb7d6c6626933fe993e03ed59ae85e16abb upstream. The bug can be triggered by sending an amdgpu_cs_wait_ioctl to the AMDGPU DRM driver on any ASICs with valid context. The bug was reported by Joonkyo Jung <joonkyoj@yonsei.ac.kr>. For example the following code: static void Syzkaller2(int fd) { union drm_amdgpu_ctx arg1; union drm_amdgpu_wait_cs arg2; arg1.in.op = AMDGPU_CTX_OP_ALLOC_CTX; ret = drmIoctl(fd, 0x140106442 /* amdgpu_ctx_ioctl */, &arg1); arg2.in.handle = 0x0; arg2.in.timeout = 0x2000000000000; arg2.in.ip_type = AMD_IP_VPE /* 0x9 */; arg2->in.ip_instance = 0x0; arg2.in.ring = 0x0; arg2.in.ctx_id = arg1.out.alloc.ctx_id; drmIoctl(fd, 0xc0206449 /* AMDGPU_WAIT_CS * /, &arg2); } The ioctl AMDGPU_WAIT_CS without previously submitted job could be assumed that the error should be returned, but the following commit 1decbf6bb0b4dc56c9da6c5e57b994ebfc2be3aa modified the logic and allowed to have sched_rq equal to NULL. As a result when there is no job the ioctl AMDGPU_WAIT_CS returns success. The change fixes null-ptr-deref in init entity and the stack below demonstrates the error condition: [ +0.000007] BUG: kernel NULL pointer dereference, address: 0000000000000028 [ +0.007086] #PF: supervisor read access in kernel mode [ +0.005234] #PF: error_code(0x0000) - not-present page [ +0.005232] PGD 0 P4D 0 [ +0.002501] Oops: 0000 [#1] PREEMPT SMP KASAN NOPTI [ +0.005034] CPU: 10 PID: 9229 Comm: amd_basic Tainted: G B W L 6.7.0+ #4 [ +0.007797] Hardware name: ASUS System Product Name/ROG STRIX B550-F GAMING (WI-FI), BIOS 1401 12/03/2020 [ +0.009798] RIP: 0010:drm_sched_entity_init+0x2d3/0x420 [gpu_sched] [ +0.006426] Code: 80 00 00 00 00 00 00 00 e8 1a 81 82 e0 49 89 9c 24 c0 00 00 00 4c 89 ef e8 4a 80 82 e0 49 8b 5d 00 48 8d 7b 28 e8 3d 80 82 e0 <48> 83 7b 28 00 0f 84 28 01 00 00 4d 8d ac 24 98 00 00 00 49 8d 5c [ +0.019094] RSP: 0018:ffffc90014c1fa40 EFLAGS: 00010282 [ +0.005237] RAX: 0000000000000001 RBX: 0000000000000000 RCX: ffffffff8113f3fa [ +0.007326] RDX: fffffbfff0a7889d RSI: 0000000000000008 RDI: ffffffff853c44e0 [ +0.007264] RBP: ffffc90014c1fa80 R08: 0000000000000001 R09: fffffbfff0a7889c [ +0.007266] R10: ffffffff853c44e7 R11: 0000000000000001 R12: ffff8881a719b010 [ +0.007263] R13: ffff88810d412748 R14: 0000000000000002 R15: 0000000000000000 [ +0.007264] FS: 00007ffff7045540(0000) GS:ffff8883cc900000(0000) knlGS:0000000000000000 [ +0.008236] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ +0.005851] CR2: 0000000000000028 CR3: 000000011912e000 CR4: 0000000000350ef0 [ +0.007175] Call Trace: [ +0.002561] <TASK> [ +0.002141] ? show_regs+0x6a/0x80 [ +0.003473] ? __die+0x25/0x70 [ +0.003124] ? page_fault_oops+0x214/0x720 [ +0.004179] ? preempt_count_sub+0x18/0xc0 [ +0.004093] ? __pfx_page_fault_oops+0x10/0x10 [ +0.004590] ? srso_return_thunk+0x5/0x5f [ +0.004000] ? vprintk_default+0x1d/0x30 [ +0.004063] ? srso_return_thunk+0x5/0x5f [ +0.004087] ? vprintk+0x5c/0x90 [ +0.003296] ? drm_sched_entity_init+0x2d3/0x420 [gpu_sched] [ +0.005807] ? srso_return_thunk+0x5/0x5f [ +0.004090] ? _printk+0xb3/0xe0 [ +0.003293] ? __pfx__printk+0x10/0x10 [ +0.003735] ? asm_sysvec_apic_timer_interrupt+0x1b/0x20 [ +0.005482] ? do_user_addr_fault+0x345/0x770 [ +0.004361] ? exc_page_fault+0x64/0xf0 [ +0.003972] ? asm_exc_page_fault+0x27/0x30 [ +0.004271] ? add_taint+0x2a/0xa0 [ +0.003476] ? drm_sched_entity_init+0x2d3/0x420 [gpu_sched] [ +0.005812] amdgpu_ctx_get_entity+0x3f9/0x770 [amdgpu] [ +0.009530] ? finish_task_switch.isra.0+0x129/0x470 [ +0.005068] ? __pfx_amdgpu_ctx_get_entity+0x10/0x10 [amdgpu] [ +0.010063] ? __kasan_check_write+0x14/0x20 [ +0.004356] ? srso_return_thunk+0x5/0x5f [ +0.004001] ? mutex_unlock+0x81/0xd0 [ +0.003802] ? srso_return_thunk+0x5/0x5f [ +0.004096] amdgpu_cs_wait_ioctl+0xf6/0x270 [amdgpu] [ +0.009355] ? __pfx_amdgpu_cs_wait_ioctl+0x10/0x10 [amdgpu] [ +0.009981] ? srso_return_thunk+0x5/0x5f [ +0.004089] ? srso_return_thunk+0x5/0x5f [ +0.004090] ? __srcu_read_lock+0x20/0x50 [ +0.004096] drm_ioctl_kernel+0x140/0x1f0 [drm] [ +0.005080] ? __pfx_amdgpu_cs_wait_ioctl+0x10/0x10 [amdgpu] [ +0.009974] ? __pfx_drm_ioctl_kernel+0x10/0x10 [drm] [ +0.005618] ? srso_return_thunk+0x5/0x5f [ +0.004088] ? __kasan_check_write+0x14/0x20 [ +0.004357] drm_ioctl+0x3da/0x730 [drm] [ +0.004461] ? __pfx_amdgpu_cs_wait_ioctl+0x10/0x10 [amdgpu] [ +0.009979] ? __pfx_drm_ioctl+0x10/0x10 [drm] [ +0.004993] ? srso_return_thunk+0x5/0x5f [ +0.004090] ? __kasan_check_write+0x14/0x20 [ +0.004356] ? srso_return_thunk+0x5/0x5f [ +0.004090] ? _raw_spin_lock_irqsave+0x99/0x100 [ +0.004712] ? __pfx__raw_spin_lock_irqsave+0x10/0x10 [ +0.005063] ? __pfx_arch_do_signal_or_restart+0x10/0x10 [ +0.005477] ? srso_return_thunk+0x5/0x5f [ +0.004000] ? preempt_count_sub+0x18/0xc0 [ +0.004237] ? srso_return_thunk+0x5/0x5f [ +0.004090] ? _raw_spin_unlock_irqrestore+0x27/0x50 [ +0.005069] amdgpu_drm_ioctl+0x7e/0xe0 [amdgpu] [ +0.008912] __x64_sys_ioctl+0xcd/0x110 [ +0.003918] do_syscall_64+0x5f/0xe0 [ +0.003649] ? noist_exc_debug+0xe6/0x120 [ +0.004095] entry_SYSCALL_64_after_hwframe+0x6e/0x76 [ +0.005150] RIP: 0033:0x7ffff7b1a94f [ +0.003647] Code: 00 48 89 44 24 18 31 c0 48 8d 44 24 60 c7 04 24 10 00 00 00 48 89 44 24 08 48 8d 44 24 20 48 89 44 24 10 b8 10 00 00 00 0f 05 <41> 89 c0 3d 00 f0 ff ff 77 1f 48 8b 44 24 18 64 48 2b 04 25 28 00 [ +0.019097] RSP: 002b:00007fffffffe0a0 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 [ +0.007708] RAX: ffffffffffffffda RBX: 000055555558b360 RCX: 00007ffff7b1a94f [ +0.007176] RDX: 000055555558b360 RSI: 00000000c0206449 RDI: 0000000000000003 [ +0.007326] RBP: 00000000c0206449 R08: 000055555556ded0 R09: 000000007fffffff [ +0.007176] R10: 0000000000000000 R11: 0000000000000246 R12: 00007fffffffe5d8 [ +0.007238] R13: 0000000000000003 R14: 000055555555cba8 R15: 00007ffff7ffd040 [ +0.007250] </TASK> v2: Reworked check to guard against null ptr deref and added helpful comments (Christian) Cc: Christian Koenig <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Cc: Luben Tuikov <ltuikov89@gmail.com> Cc: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Cc: Joonkyo Jung <joonkyoj@yonsei.ac.kr> Cc: Dokyung Song <dokyungs@yonsei.ac.kr> Cc: <jisoo.jang@yonsei.ac.kr> Cc: <yw9865@yonsei.ac.kr> Signed-off-by: Vitaly Prosyak <vitaly.prosyak@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Fixes: 56e449603f0a ("drm/sched: Convert the GPU scheduler to variable number of run-queues") Link: https://patchwork.freedesktop.org/patch/msgid/20240315023926.343164-1-vitaly.prosyak@amd.com Signed-off-by: Christian König <christian.koenig@amd.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* drm/amdgpu: fix use-after-free bugVitaly Prosyak2024-04-031-4/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit 22207fd5c80177b860279653d017474b2812af5e upstream. The bug can be triggered by sending a single amdgpu_gem_userptr_ioctl to the AMDGPU DRM driver on any ASICs with an invalid address and size. The bug was reported by Joonkyo Jung <joonkyoj@yonsei.ac.kr>. For example the following code: static void Syzkaller1(int fd) { struct drm_amdgpu_gem_userptr arg; int ret; arg.addr = 0xffffffffffff0000; arg.size = 0x80000000; /*2 Gb*/ arg.flags = 0x7; ret = drmIoctl(fd, 0xc1186451/*amdgpu_gem_userptr_ioctl*/, &arg); } Due to the address and size are not valid there is a failure in amdgpu_hmm_register->mmu_interval_notifier_insert->__mmu_interval_notifier_insert-> check_shl_overflow, but we even the amdgpu_hmm_register failure we still call amdgpu_hmm_unregister into amdgpu_gem_object_free which causes access to a bad address. The following stack is below when the issue is reproduced when Kazan is enabled: [ +0.000014] Hardware name: ASUS System Product Name/ROG STRIX B550-F GAMING (WI-FI), BIOS 1401 12/03/2020 [ +0.000009] RIP: 0010:mmu_interval_notifier_remove+0x327/0x340 [ +0.000017] Code: ff ff 49 89 44 24 08 48 b8 00 01 00 00 00 00 ad de 4c 89 f7 49 89 47 40 48 83 c0 22 49 89 47 48 e8 ce d1 2d 01 e9 32 ff ff ff <0f> 0b e9 16 ff ff ff 4c 89 ef e8 fa 14 b3 ff e9 36 ff ff ff e8 80 [ +0.000014] RSP: 0018:ffffc90002657988 EFLAGS: 00010246 [ +0.000013] RAX: 0000000000000000 RBX: 1ffff920004caf35 RCX: ffffffff8160565b [ +0.000011] RDX: dffffc0000000000 RSI: 0000000000000004 RDI: ffff8881a9f78260 [ +0.000010] RBP: ffffc90002657a70 R08: 0000000000000001 R09: fffff520004caf25 [ +0.000010] R10: 0000000000000003 R11: ffffffff8161d1d6 R12: ffff88810e988c00 [ +0.000010] R13: ffff888126fb5a00 R14: ffff88810e988c0c R15: ffff8881a9f78260 [ +0.000011] FS: 00007ff9ec848540(0000) GS:ffff8883cc880000(0000) knlGS:0000000000000000 [ +0.000012] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ +0.000010] CR2: 000055b3f7e14328 CR3: 00000001b5770000 CR4: 0000000000350ef0 [ +0.000010] Call Trace: [ +0.000006] <TASK> [ +0.000007] ? show_regs+0x6a/0x80 [ +0.000018] ? __warn+0xa5/0x1b0 [ +0.000019] ? mmu_interval_notifier_remove+0x327/0x340 [ +0.000018] ? report_bug+0x24a/0x290 [ +0.000022] ? handle_bug+0x46/0x90 [ +0.000015] ? exc_invalid_op+0x19/0x50 [ +0.000016] ? asm_exc_invalid_op+0x1b/0x20 [ +0.000017] ? kasan_save_stack+0x26/0x50 [ +0.000017] ? mmu_interval_notifier_remove+0x23b/0x340 [ +0.000019] ? mmu_interval_notifier_remove+0x327/0x340 [ +0.000019] ? mmu_interval_notifier_remove+0x23b/0x340 [ +0.000020] ? __pfx_mmu_interval_notifier_remove+0x10/0x10 [ +0.000017] ? kasan_save_alloc_info+0x1e/0x30 [ +0.000018] ? srso_return_thunk+0x5/0x5f [ +0.000014] ? __kasan_kmalloc+0xb1/0xc0 [ +0.000018] ? srso_return_thunk+0x5/0x5f [ +0.000013] ? __kasan_check_read+0x11/0x20 [ +0.000020] amdgpu_hmm_unregister+0x34/0x50 [amdgpu] [ +0.004695] amdgpu_gem_object_free+0x66/0xa0 [amdgpu] [ +0.004534] ? __pfx_amdgpu_gem_object_free+0x10/0x10 [amdgpu] [ +0.004291] ? do_syscall_64+0x5f/0xe0 [ +0.000023] ? srso_return_thunk+0x5/0x5f [ +0.000017] drm_gem_object_free+0x3b/0x50 [drm] [ +0.000489] amdgpu_gem_userptr_ioctl+0x306/0x500 [amdgpu] [ +0.004295] ? __pfx_amdgpu_gem_userptr_ioctl+0x10/0x10 [amdgpu] [ +0.004270] ? srso_return_thunk+0x5/0x5f [ +0.000014] ? __this_cpu_preempt_check+0x13/0x20 [ +0.000015] ? srso_return_thunk+0x5/0x5f [ +0.000013] ? sysvec_apic_timer_interrupt+0x57/0xc0 [ +0.000020] ? srso_return_thunk+0x5/0x5f [ +0.000014] ? asm_sysvec_apic_timer_interrupt+0x1b/0x20 [ +0.000022] ? drm_ioctl_kernel+0x17b/0x1f0 [drm] [ +0.000496] ? __pfx_amdgpu_gem_userptr_ioctl+0x10/0x10 [amdgpu] [ +0.004272] ? drm_ioctl_kernel+0x190/0x1f0 [drm] [ +0.000492] drm_ioctl_kernel+0x140/0x1f0 [drm] [ +0.000497] ? __pfx_amdgpu_gem_userptr_ioctl+0x10/0x10 [amdgpu] [ +0.004297] ? __pfx_drm_ioctl_kernel+0x10/0x10 [drm] [ +0.000489] ? srso_return_thunk+0x5/0x5f [ +0.000011] ? __kasan_check_write+0x14/0x20 [ +0.000016] drm_ioctl+0x3da/0x730 [drm] [ +0.000475] ? __pfx_amdgpu_gem_userptr_ioctl+0x10/0x10 [amdgpu] [ +0.004293] ? __pfx_drm_ioctl+0x10/0x10 [drm] [ +0.000506] ? __pfx_rpm_resume+0x10/0x10 [ +0.000016] ? srso_return_thunk+0x5/0x5f [ +0.000011] ? __kasan_check_write+0x14/0x20 [ +0.000010] ? srso_return_thunk+0x5/0x5f [ +0.000011] ? _raw_spin_lock_irqsave+0x99/0x100 [ +0.000015] ? __pfx__raw_spin_lock_irqsave+0x10/0x10 [ +0.000014] ? srso_return_thunk+0x5/0x5f [ +0.000013] ? srso_return_thunk+0x5/0x5f [ +0.000011] ? srso_return_thunk+0x5/0x5f [ +0.000011] ? preempt_count_sub+0x18/0xc0 [ +0.000013] ? srso_return_thunk+0x5/0x5f [ +0.000010] ? _raw_spin_unlock_irqrestore+0x27/0x50 [ +0.000019] amdgpu_drm_ioctl+0x7e/0xe0 [amdgpu] [ +0.004272] __x64_sys_ioctl+0xcd/0x110 [ +0.000020] do_syscall_64+0x5f/0xe0 [ +0.000021] entry_SYSCALL_64_after_hwframe+0x6e/0x76 [ +0.000015] RIP: 0033:0x7ff9ed31a94f [ +0.000012] Code: 00 48 89 44 24 18 31 c0 48 8d 44 24 60 c7 04 24 10 00 00 00 48 89 44 24 08 48 8d 44 24 20 48 89 44 24 10 b8 10 00 00 00 0f 05 <41> 89 c0 3d 00 f0 ff ff 77 1f 48 8b 44 24 18 64 48 2b 04 25 28 00 [ +0.000013] RSP: 002b:00007fff25f66790 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 [ +0.000016] RAX: ffffffffffffffda RBX: 000055b3f7e133e0 RCX: 00007ff9ed31a94f [ +0.000012] RDX: 000055b3f7e133e0 RSI: 00000000c1186451 RDI: 0000000000000003 [ +0.000010] RBP: 00000000c1186451 R08: 0000000000000000 R09: 0000000000000000 [ +0.000009] R10: 0000000000000008 R11: 0000000000000246 R12: 00007fff25f66ca8 [ +0.000009] R13: 0000000000000003 R14: 000055b3f7021ba8 R15: 00007ff9ed7af040 [ +0.000024] </TASK> [ +0.000007] ---[ end trace 0000000000000000 ]--- v2: Consolidate any error handling into amdgpu_hmm_register which applied to kfd_bo also. (Christian) v3: Improve syntax and comment (Christian) Cc: Christian Koenig <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Cc: Felix Kuehling <felix.kuehling@amd.com> Cc: Joonkyo Jung <joonkyoj@yonsei.ac.kr> Cc: Dokyung Song <dokyungs@yonsei.ac.kr> Cc: <jisoo.jang@yonsei.ac.kr> Cc: <yw9865@yonsei.ac.kr> Signed-off-by: Vitaly Prosyak <vitaly.prosyak@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* tools/resolve_btfids: fix build with musl libcNatanael Copa2024-04-031-0/+2
| | | | | | | | | | | | | | | | | commit 62248b22d01e96a4d669cde0d7005bd51ebf9e76 upstream. Include the header that defines u32. This fixes build of 6.6.23 and 6.1.83 kernels for Alpine Linux, which uses musl libc. I assume that GNU libc indirecly pulls in linux/types.h. Fixes: 9707ac4fe2f5 ("tools/resolve_btfids: Refactor set sorting with types from btf_ids.h") Closes: https://bugzilla.kernel.org/show_bug.cgi?id=218647 Cc: stable@vger.kernel.org Signed-off-by: Natanael Copa <ncopa@alpinelinux.org> Tested-by: Greg Thelen <gthelen@google.com> Link: https://lore.kernel.org/r/20240328110103.28734-1-ncopa@alpinelinux.org Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* x86/sev: Skip ROM range scans and validation for SEV-SNP guestsKevin Loughlin2024-04-038-31/+39
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit 0f4a1e80989aca185d955fcd791d7750082044a2 upstream. SEV-SNP requires encrypted memory to be validated before access. Because the ROM memory range is not part of the e820 table, it is not pre-validated by the BIOS. Therefore, if a SEV-SNP guest kernel wishes to access this range, the guest must first validate the range. The current SEV-SNP code does indeed scan the ROM range during early boot and thus attempts to validate the ROM range in probe_roms(). However, this behavior is neither sufficient nor necessary for the following reasons: * With regards to sufficiency, if EFI_CONFIG_TABLES are not enabled and CONFIG_DMI_SCAN_MACHINE_NON_EFI_FALLBACK is set, the kernel will attempt to access the memory at SMBIOS_ENTRY_POINT_SCAN_START (which falls in the ROM range) prior to validation. For example, Project Oak Stage 0 provides a minimal guest firmware that currently meets these configuration conditions, meaning guests booting atop Oak Stage 0 firmware encounter a problematic call chain during dmi_setup() -> dmi_scan_machine() that results in a crash during boot if SEV-SNP is enabled. * With regards to necessity, SEV-SNP guests generally read garbage (which changes across boots) from the ROM range, meaning these scans are unnecessary. The guest reads garbage because the legacy ROM range is unencrypted data but is accessed via an encrypted PMD during early boot (where the PMD is marked as encrypted due to potentially mapping actually-encrypted data in other PMD-contained ranges). In one exceptional case, EISA probing treats the ROM range as unencrypted data, which is inconsistent with other probing. Continuing to allow SEV-SNP guests to use garbage and to inconsistently classify ROM range encryption status can trigger undesirable behavior. For instance, if garbage bytes appear to be a valid signature, memory may be unnecessarily reserved for the ROM range. Future code or other use cases may result in more problematic (arbitrary) behavior that should be avoided. While one solution would be to overhaul the early PMD mapping to always treat the ROM region of the PMD as unencrypted, SEV-SNP guests do not currently rely on data from the ROM region during early boot (and even if they did, they would be mostly relying on garbage data anyways). As a simpler solution, skip the ROM range scans (and the otherwise- necessary range validation) during SEV-SNP guest early boot. The potential SEV-SNP guest crash due to lack of ROM range validation is thus avoided by simply not accessing the ROM range. In most cases, skip the scans by overriding problematic x86_init functions during sme_early_init() to SNP-safe variants, which can be likened to x86_init overrides done for other platforms (ex: Xen); such overrides also avoid the spread of cc_platform_has() checks throughout the tree. In the exceptional EISA case, still use cc_platform_has() for the simplest change, given (1) checks for guest type (ex: Xen domain status) are already performed here, and (2) these checks occur in a subsys initcall instead of an x86_init function. [ bp: Massage commit message, remove "we"s. ] Fixes: 9704c07bf9f7 ("x86/kernel: Validate ROM memory before accessing when SEV-SNP is active") Signed-off-by: Kevin Loughlin <kevinloughlin@google.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Cc: <stable@kernel.org> Link: https://lore.kernel.org/r/20240313121546.2964854-1-kevinloughlin@google.com Signed-off-by: Kevin Loughlin <kevinloughlin@google.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* scsi: libsas: Fix disk not being scanned in after being removedXingui Yang2024-04-031-10/+22
| | | | | | | | | | | | | | | | | | | | | | commit 8e68a458bcf5b5cb9c3624598bae28f08251601f upstream. As of commit d8649fc1c5e4 ("scsi: libsas: Do discovery on empty PHY to update PHY info"), do discovery will send a new SMP_DISCOVER and update phy->phy_change_count. We found that if the disk is reconnected and phy change_count changes at this time, the disk scanning process will not be triggered. Therefore, call sas_set_ex_phy() to update the PHY info with the results of the last query. And because the previous phy info will be used when calling sas_unregister_devs_sas_addr(), sas_unregister_devs_sas_addr() should be called before sas_set_ex_phy(). Fixes: d8649fc1c5e4 ("scsi: libsas: Do discovery on empty PHY to update PHY info") Signed-off-by: Xingui Yang <yangxingui@huawei.com> Link: https://lore.kernel.org/r/20240307141413.48049-3-yangxingui@huawei.com Reviewed-by: John Garry <john.g.garry@oracle.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* scsi: lpfc: Correct size for wqe for memset()Muhammad Usama Anjum2024-04-031-1/+1
| | | | | | | | | | | | | | commit 28d41991182c210ec1654f8af2e140ef4cc73f20 upstream. The wqe is of type lpfc_wqe128. It should be memset with the same type. Fixes: 6c621a2229b0 ("scsi: lpfc: Separate NVMET RQ buffer posting from IO resources SGL/iocbq/context") Signed-off-by: Muhammad Usama Anjum <usama.anjum@collabora.com> Link: https://lore.kernel.org/r/20240304090649.833953-1-usama.anjum@collabora.com Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com> Reviewed-by: Justin Tee <justintee8345@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* scsi: libsas: Add a helper sas_get_sas_addr_and_dev_type()Xingui Yang2024-04-031-7/+12
| | | | | | | | | | | | | commit a57345279fd311ba679b8083feb0eec5272c7729 upstream. Add a helper to get attached_sas_addr and device type from disc_resp. Suggested-by: John Garry <john.g.garry@oracle.com> Signed-off-by: Xingui Yang <yangxingui@huawei.com> Link: https://lore.kernel.org/r/20240307141413.48049-2-yangxingui@huawei.com Reviewed-by: John Garry <john.g.garry@oracle.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* scsi: lpfc: Correct size for cmdwqe/rspwqe for memset()Muhammad Usama Anjum2024-04-031-2/+2
| | | | | | | | | | | | | | commit 16cc2ba71b9f6440805aef7f92ba0f031f79b765 upstream. The cmdwqe and rspwqe are of type lpfc_wqe128. They should be memset() with the same type. Fixes: 61910d6a5243 ("scsi: lpfc: SLI path split: Refactor CT paths") Signed-off-by: Muhammad Usama Anjum <usama.anjum@collabora.com> Link: https://lore.kernel.org/r/20240304091119.847060-1-usama.anjum@collabora.com Reviewed-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* usb: dwc3: pci: Drop duplicate IDHeikki Krogerus2024-04-031-2/+0
| | | | | | | | | | | | | | commit f121531703ae442edc1dde4b56803680628bc5b7 upstream. Intel Arrow Lake CPU uses the Meteor Lake ID with this controller (the controller that's part of the Intel Arrow Lake chipset (PCH) does still have unique PCI ID). Fixes: de4b5b28c87c ("usb: dwc3: pci: add support for the Intel Arrow Lake-H") Signed-off-by: Heikki Krogerus <heikki.krogerus@linux.intel.com> Acked-by: Thinh Nguyen <Thinh.Nguyen@synopsys.com> Link: https://lore.kernel.org/r/20240312115008.1748637-1-heikki.krogerus@linux.intel.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* Revert "x86/bugs: Use fixed addressing for VERW operand"Dave Hansen2024-04-031-1/+1
| | | | | | | | | | | | | | | | | commit 532a0c57d7ff75e8f07d4e25cba4184989e2a241 upstream. This was reverts commit 8009479ee919b9a91674f48050ccbff64eafedaa. It was originally in x86/urgent, but was deemed wrong so got zapped. But in the meantime, x86/urgent had been merged into x86/apic to resolve a conflict. I didn't notice the merge so didn't zap it from x86/apic and it managed to make it up with the x86/apic material. The reverted commit is known to cause some KASAN problems. Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* x86/bugs: Use fixed addressing for VERW operandPawan Gupta2024-04-031-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | commit 8009479ee919b9a91674f48050ccbff64eafedaa upstream. The macro used for MDS mitigation executes VERW with relative addressing for the operand. This was necessary in earlier versions of the series. Now it is unnecessary and creates a problem for backports on older kernels that don't support relocations in alternatives. Relocation support was added by commit 270a69c4485d ("x86/alternative: Support relocations in alternatives"). Also asm for fixed addressing is much cleaner than relative RIP addressing. Simplify the asm by using fixed addressing for VERW operand. [ dhansen: tweak changelog ] Closes: https://lore.kernel.org/lkml/20558f89-299b-472e-9a96-171403a83bd6@suse.com/ Fixes: baf8361e5455 ("x86/bugs: Add asm helpers for executing VERW") Reported-by: Nikolay Borisov <nik.borisov@suse.com> Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Link: https://lore.kernel.org/all/20240226-verw-arg-fix-v1-1-7b37ee6fd57d%40linux.intel.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* crash: use macro to add crashk_res into iomem early for specific archBaoquan He2024-04-032-0/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit 32fbe5246582af4f611ccccee33fd6e559087252 upstream. There are regression reports[1][2] that crashkernel region on x86_64 can't be added into iomem tree sometime. This causes the later failure of kdump loading. This happened after commit 4a693ce65b18 ("kdump: defer the insertion of crashkernel resources") was merged. Even though, these reported issues are proved to be related to other component, they are just exposed after above commmit applied, I still would like to keep crashk_res and crashk_low_res being added into iomem early as before because the early adding has been always there on x86_64 and working very well. For safety of kdump, Let's change it back. Here, add a macro HAVE_ARCH_ADD_CRASH_RES_TO_IOMEM_EARLY to limit that only ARCH defining the macro can have the early adding crashk_res/_low_res into iomem. Then define HAVE_ARCH_ADD_CRASH_RES_TO_IOMEM_EARLY on x86 to enable it. Note: In reserve_crashkernel_low(), there's a remnant of crashk_low_res handling which was mistakenly added back in commit 85fcde402db1 ("kexec: split crashkernel reservation code out from crash_core.c"). [1] [PATCH V2] x86/kexec: do not update E820 kexec table for setup_data https://lore.kernel.org/all/Zfv8iCL6CT2JqLIC@darkstar.users.ipa.redhat.com/T/#u [2] Question about Address Range Validation in Crash Kernel Allocation https://lore.kernel.org/all/4eeac1f733584855965a2ea62fa4da58@huawei.com/T/#u Link: https://lkml.kernel.org/r/ZgDYemRQ2jxjLkq+@MiWiFi-R3L-srv Fixes: 4a693ce65b18 ("kdump: defer the insertion of crashkernel resources") Signed-off-by: Baoquan He <bhe@redhat.com> Cc: Dave Young <dyoung@redhat.com> Cc: Huacai Chen <chenhuacai@loongson.cn> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Bohac <jbohac@suse.cz> Cc: Li Huafei <lihuafei1@huawei.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Baoquan He <bhe@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* x86/bugs: Fix the SRSO mitigation on Zen3/4Borislav Petkov (AMD)2024-04-033-10/+23
| | | | | | | | | | | | | | | | | | | | | | | commit 4535e1a4174c4111d92c5a9a21e542d232e0fcaa upstream. The original version of the mitigation would patch in the calls to the untraining routines directly. That is, the alternative() in UNTRAIN_RET will patch in the CALL to srso_alias_untrain_ret() directly. However, even if commit e7c25c441e9e ("x86/cpu: Cleanup the untrain mess") meant well in trying to clean up the situation, due to micro- architectural reasons, the untraining routine srso_alias_untrain_ret() must be the target of a CALL instruction and not of a JMP instruction as it is done now. Reshuffle the alternative macros to accomplish that. Fixes: e7c25c441e9e ("x86/cpu: Cleanup the untrain mess") Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Ingo Molnar <mingo@kernel.org> Cc: stable@kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* scsi: qla2xxx: Delay I/O Abort on PCI errorQuinn Tran2024-04-031-2/+12
| | | | | | | | | | | | | | | | | | | commit 591c1fdf2016d118b8fbde427b796fac13f3f070 upstream. Currently when PCI error is detected, I/O is aborted manually through the ABORT IOCB mechanism which is not guaranteed to succeed. Instead, wait for the OS or system to notify driver to wind down I/O through the pci_error_handlers api. Set eeh_busy flag to pause all traffic and wait for I/O to drain. Cc: stable@vger.kernel.org Signed-off-by: Quinn Tran <qutran@marvell.com> Signed-off-by: Nilesh Javali <njavali@marvell.com> Link: https://lore.kernel.org/r/20240227164127.36465-11-njavali@marvell.com Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* scsi: qla2xxx: Change debug message during driver unloadSaurav Kashyap2024-04-031-1/+1
| | | | | | | | | | | | | | | | | commit b5a30840727a3e41d12a336d19f6c0716b299161 upstream. Upon driver unload, purge_mbox flag is set and the heartbeat monitor thread detects this flag and does not send the mailbox command down to FW with a debug message "Error detected: purge[1] eeh[0] cmd=0x0, Exiting". This being not a real error, change the debug message. Cc: stable@vger.kernel.org Signed-off-by: Saurav Kashyap <skashyap@marvell.com> Signed-off-by: Nilesh Javali <njavali@marvell.com> Link: https://lore.kernel.org/r/20240227164127.36465-10-njavali@marvell.com Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* scsi: qla2xxx: Fix double free of fcportSaurav Kashyap2024-04-031-2/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit 82f522ae0d97119a43da53e0f729275691b9c525 upstream. The server was crashing after LOGO because fcport was getting freed twice. -----------[ cut here ]----------- kernel BUG at mm/slub.c:371! invalid opcode: 0000 1 SMP PTI CPU: 35 PID: 4610 Comm: bash Kdump: loaded Tainted: G OE --------- - - 4.18.0-425.3.1.el8.x86_64 #1 Hardware name: HPE ProLiant DL360 Gen10/ProLiant DL360 Gen10, BIOS U32 09/03/2021 RIP: 0010:set_freepointer.part.57+0x0/0x10 RSP: 0018:ffffb07107027d90 EFLAGS: 00010246 RAX: ffff9cb7e3150000 RBX: ffff9cb7e332b9c0 RCX: ffff9cb7e3150400 RDX: 0000000000001f37 RSI: 0000000000000000 RDI: ffff9cb7c0005500 RBP: fffff693448c5400 R08: 0000000080000000 R09: 0000000000000009 R10: 0000000000000000 R11: 0000000000132af0 R12: ffff9cb7c0005500 R13: ffff9cb7e3150000 R14: ffffffffc06990e0 R15: ffff9cb7ea85ea58 FS: 00007ff6b79c2740(0000) GS:ffff9cb8f7ec0000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 000055b426b7d700 CR3: 0000000169c18002 CR4: 00000000007706e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 PKRU: 55555554 Call Trace: kfree+0x238/0x250 qla2x00_els_dcmd_sp_free+0x20/0x230 [qla2xxx] ? qla24xx_els_dcmd_iocb+0x607/0x690 [qla2xxx] qla2x00_issue_logo+0x28c/0x2a0 [qla2xxx] ? qla2x00_issue_logo+0x28c/0x2a0 [qla2xxx] ? kernfs_fop_write+0x11e/0x1a0 Remove one of the free calls and add check for valid fcport. Also use function qla2x00_free_fcport() instead of kfree(). Cc: stable@vger.kernel.org Signed-off-by: Saurav Kashyap <skashyap@marvell.com> Signed-off-by: Nilesh Javali <njavali@marvell.com> Link: https://lore.kernel.org/r/20240227164127.36465-9-njavali@marvell.com Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* scsi: qla2xxx: Fix double free of the ha->vp_map pointerSaurav Kashyap2024-04-031-0/+1
| | | | | | | | | | | | | | | | | | commit e288285d47784fdcf7c81be56df7d65c6f10c58b upstream. Coverity scan reported potential risk of double free of the pointer ha->vp_map. ha->vp_map was freed in qla2x00_mem_alloc(), and again freed in function qla2x00_mem_free(ha). Assign NULL to vp_map and kfree take care of NULL. Cc: stable@vger.kernel.org Signed-off-by: Saurav Kashyap <skashyap@marvell.com> Signed-off-by: Nilesh Javali <njavali@marvell.com> Link: https://lore.kernel.org/r/20240227164127.36465-8-njavali@marvell.com Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* scsi: qla2xxx: Fix command flush on cable pullQuinn Tran2024-04-031-0/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit a27d4d0e7de305def8a5098a614053be208d1aa1 upstream. System crash due to command failed to flush back to SCSI layer. BUG: unable to handle kernel NULL pointer dereference at 0000000000000000 PGD 0 P4D 0 Oops: 0000 [#1] SMP NOPTI CPU: 27 PID: 793455 Comm: kworker/u130:6 Kdump: loaded Tainted: G OE --------- - - 4.18.0-372.9.1.el8.x86_64 #1 Hardware name: HPE ProLiant DL360 Gen10/ProLiant DL360 Gen10, BIOS U32 09/03/2021 Workqueue: nvme-wq nvme_fc_connect_ctrl_work [nvme_fc] RIP: 0010:__wake_up_common+0x4c/0x190 Code: 24 10 4d 85 c9 74 0a 41 f6 01 04 0f 85 9d 00 00 00 48 8b 43 08 48 83 c3 08 4c 8d 48 e8 49 8d 41 18 48 39 c3 0f 84 f0 00 00 00 <49> 8b 41 18 89 54 24 08 31 ed 4c 8d 70 e8 45 8b 29 41 f6 c5 04 75 RSP: 0018:ffff95f3e0cb7cd0 EFLAGS: 00010086 RAX: 0000000000000000 RBX: ffff8b08d3b26328 RCX: 0000000000000000 RDX: 0000000000000001 RSI: 0000000000000003 RDI: ffff8b08d3b26320 RBP: 0000000000000001 R08: 0000000000000000 R09: ffffffffffffffe8 R10: 0000000000000000 R11: ffff95f3e0cb7a60 R12: ffff95f3e0cb7d20 R13: 0000000000000003 R14: 0000000000000000 R15: 0000000000000000 FS: 0000000000000000(0000) GS:ffff8b2fdf6c0000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000000 CR3: 0000002f1e410002 CR4: 00000000007706e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 PKRU: 55555554 Call Trace: __wake_up_common_lock+0x7c/0xc0 qla_nvme_ls_req+0x355/0x4c0 [qla2xxx] qla2xxx [0000:12:00.1]-f084:3: qlt_free_session_done: se_sess 0000000000000000 / sess ffff8ae1407ca000 from port 21:32:00:02:ac:07:ee:b8 loop_id 0x02 s_id 01:02:00 logout 1 keep 0 els_logo 0 ? __nvme_fc_send_ls_req+0x260/0x380 [nvme_fc] qla2xxx [0000:12:00.1]-207d:3: FCPort 21:32:00:02:ac:07:ee:b8 state transitioned from ONLINE to LOST - portid=010200. ? nvme_fc_send_ls_req.constprop.42+0x1a/0x45 [nvme_fc] qla2xxx [0000:12:00.1]-2109:3: qla2x00_schedule_rport_del 21320002ac07eeb8. rport ffff8ae598122000 roles 1 ? nvme_fc_connect_ctrl_work.cold.63+0x1e3/0xa7d [nvme_fc] qla2xxx [0000:12:00.1]-f084:3: qlt_free_session_done: se_sess 0000000000000000 / sess ffff8ae14801e000 from port 21:32:01:02:ad:f7:ee:b8 loop_id 0x04 s_id 01:02:01 logout 1 keep 0 els_logo 0 ? __switch_to+0x10c/0x450 ? process_one_work+0x1a7/0x360 qla2xxx [0000:12:00.1]-207d:3: FCPort 21:32:01:02:ad:f7:ee:b8 state transitioned from ONLINE to LOST - portid=010201. ? worker_thread+0x1ce/0x390 ? create_worker+0x1a0/0x1a0 qla2xxx [0000:12:00.1]-2109:3: qla2x00_schedule_rport_del 21320102adf7eeb8. rport ffff8ae3b2312800 roles 70 ? kthread+0x10a/0x120 qla2xxx [0000:12:00.1]-2112:3: qla_nvme_unregister_remote_port: unregister remoteport on ffff8ae14801e000 21320102adf7eeb8 ? set_kthread_struct+0x40/0x40 qla2xxx [0000:12:00.1]-2110:3: remoteport_delete of ffff8ae14801e000 21320102adf7eeb8 completed. ? ret_from_fork+0x1f/0x40 qla2xxx [0000:12:00.1]-f086:3: qlt_free_session_done: waiting for sess ffff8ae14801e000 logout The system was under memory stress where driver was not able to allocate an SRB to carry out error recovery of cable pull. The failure to flush causes upper layer to start modifying scsi_cmnd. When the system frees up some memory, the subsequent cable pull trigger another command flush. At this point the driver access a null pointer when attempting to DMA unmap the SGL. Add a check to make sure commands are flush back on session tear down to prevent the null pointer access. Cc: stable@vger.kernel.org Signed-off-by: Quinn Tran <qutran@marvell.com> Signed-off-by: Nilesh Javali <njavali@marvell.com> Link: https://lore.kernel.org/r/20240227164127.36465-7-njavali@marvell.com Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* scsi: qla2xxx: NVME|FCP prefer flag not being honoredQuinn Tran2024-04-031-0/+18
| | | | | | | | | | | | | | | | | | | | commit 69aecdd410106dc3a8f543a4f7ec6379b995b8d0 upstream. Changing of [FCP|NVME] prefer flag in flash has no effect on driver. For device that supports both FCP + NVMe over the same connection, driver continues to connect to this device using the previous successful login mode. On completion of flash update, adapter will be reset. Driver will reset the prefer flag based on setting from flash. Cc: stable@vger.kernel.org Signed-off-by: Quinn Tran <qutran@marvell.com> Signed-off-by: Nilesh Javali <njavali@marvell.com> Link: https://lore.kernel.org/r/20240227164127.36465-6-njavali@marvell.com Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* scsi: qla2xxx: Update manufacturer detailBikash Hazarika2024-04-031-1/+1
| | | | | | | | | | | | | | | commit 688fa069fda6fce24d243cddfe0c7024428acb74 upstream. Update manufacturer detail from "Marvell Semiconductor, Inc." to "Marvell". Cc: stable@vger.kernel.org Signed-off-by: Bikash Hazarika <bhazarika@marvell.com> Signed-off-by: Nilesh Javali <njavali@marvell.com> Link: https://lore.kernel.org/r/20240227164127.36465-5-njavali@marvell.com Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* scsi: qla2xxx: Split FCE|EFT trace controlQuinn Tran2024-04-031-61/+41
| | | | | | | | | | | | | | | | | | | commit 76a192e1a566e15365704b9f8fb3b70825f85064 upstream. Current code combines the allocation of FCE|EFT trace buffers and enables the features all in 1 step. Split this step into separate steps in preparation for follow-on patch to allow user to have a choice to enable / disable FCE trace feature. Cc: stable@vger.kernel.org Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Quinn Tran <qutran@marvell.com> Signed-off-by: Nilesh Javali <njavali@marvell.com> Link: https://lore.kernel.org/r/20240227164127.36465-4-njavali@marvell.com Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* scsi: qla2xxx: Fix N2N stuck connectionQuinn Tran2024-04-033-23/+13
| | | | | | | | | | | | | | | | commit 881eb861ca3877300570db10abbf11494e48548d upstream. Disk failed to rediscover after chip reset error injection. The chip reset happens at the time when a PLOGI is being sent. This causes a flag to be left on which blocks the retry. Clear the blocking flag. Cc: stable@vger.kernel.org Signed-off-by: Quinn Tran <qutran@marvell.com> Signed-off-by: Nilesh Javali <njavali@marvell.com> Link: https://lore.kernel.org/r/20240227164127.36465-3-njavali@marvell.com Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* scsi: qla2xxx: Prevent command send on chip resetQuinn Tran2024-04-032-4/+37
| | | | | | | | | | | | | | | | | | | commit 4895009c4bb72f71f2e682f1e7d2c2d96e482087 upstream. Currently IOCBs are allowed to push through while chip reset could be in progress. During chip reset the outstanding_cmds array is cleared twice. Once when any command on this array is returned as failed and secondly when the array is initialize to zero. If a command is inserted on to the array between these intervals, then the command will be lost. Check for chip reset before sending IOCB. Cc: stable@vger.kernel.org Signed-off-by: Quinn Tran <qutran@marvell.com> Signed-off-by: Nilesh Javali <njavali@marvell.com> Link: https://lore.kernel.org/r/20240227164127.36465-2-njavali@marvell.com Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* usb: typec: ucsi: Clear UCSI_CCI_RESET_COMPLETE before resetChristian A. Ehrhardt2024-04-031-1/+35
| | | | | | | | | | | | | | | commit 3de4f996a0b5412aa451729008130a488f71563e upstream. Check the UCSI_CCI_RESET_COMPLETE complete flag before starting another reset. Use a UCSI_SET_NOTIFICATION_ENABLE command to clear the flag if it is set. Signed-off-by: Christian A. Ehrhardt <lk@c--e.de> Cc: stable <stable@kernel.org> Reviewed-by: Heikki Krogerus <heikki.krogerus@linux.intel.com> Tested-by: Neil Armstrong <neil.armstrong@linaro.org> # on SM8550-QRD Link: https://lore.kernel.org/r/20240320073927.1641788-6-lk@c--e.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* usb: typec: ucsi_acpi: Refactor and fix DELL quirkChristian A. Ehrhardt2024-04-031-42/+33
| | | | | | | | | | | | | | | | | | | | | | | | | commit 6aaceb7d9cd00f3e065dc4b054ecfe52c5253b03 upstream. Some DELL systems don't like UCSI_ACK_CC_CI commands with the UCSI_ACK_CONNECTOR_CHANGE but not the UCSI_ACK_COMMAND_COMPLETE bit set. The current quirk still leaves room for races because it requires two consecutive ACK commands to be sent. Refactor and significantly simplify the quirk to fix this: Send a dummy command and bundle the connector change ack with the command completion ack in a single UCSI_ACK_CC_CI command. This removes the need to probe for the quirk. While there define flag bits for struct ucsi_acpi->flags in ucsi_acpi.c and don't re-use definitions from ucsi.h for struct ucsi->flags. Fixes: f3be347ea42d ("usb: ucsi_acpi: Quirk to ack a connector change ack cmd") Cc: stable@vger.kernel.org Signed-off-by: Christian A. Ehrhardt <lk@c--e.de> Reviewed-by: Heikki Krogerus <heikki.krogerus@linux.intel.com> Tested-by: Neil Armstrong <neil.armstrong@linaro.org> # on SM8550-QRD Link: https://lore.kernel.org/r/20240320073927.1641788-5-lk@c--e.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* usb: typec: ucsi: Ack unsupported commandsChristian A. Ehrhardt2024-04-031-1/+5
| | | | | | | | | | | | | | | | commit 6b5c85ddeea77d18c4b69e3bda60e9374a20c304 upstream. If a command completes the OPM must send an ack. This applies to unsupported commands, too. Send the required ACK for unsupported commands. Signed-off-by: Christian A. Ehrhardt <lk@c--e.de> Cc: stable <stable@kernel.org> Reviewed-by: Heikki Krogerus <heikki.krogerus@linux.intel.com> Tested-by: Neil Armstrong <neil.armstrong@linaro.org> # on SM8550-QRD Link: https://lore.kernel.org/r/20240320073927.1641788-4-lk@c--e.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* usb: typec: ucsi: Clear EVENT_PENDING under PPM lockChristian A. Ehrhardt2024-04-031-2/+2
| | | | | | | | | | | | | | | | | | | | commit 15b2e71b4653b3e13df34695a29ebeee237c5af2 upstream. Suppose we sleep on the PPM lock after clearing the EVENT_PENDING bit because the thread for another connector is executing a command. In this case the command completion of the other command will still report the connector change for our connector. Clear the EVENT_PENDING bit under the PPM lock to avoid another useless call to ucsi_handle_connector_change() in this case. Fixes: c9aed03a0a68 ("usb: ucsi: Add missing ppm_lock") Cc: stable <stable@kernel.org> Signed-off-by: Christian A. Ehrhardt <lk@c--e.de> Reviewed-by: Heikki Krogerus <heikki.krogerus@linux.intel.com> Tested-by: Neil Armstrong <neil.armstrong@linaro.org> # on SM8550-QRD Link: https://lore.kernel.org/r/20240320073927.1641788-2-lk@c--e.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* usb: typec: Return size of buffer if pd_set operation succeedsKyle Tso2024-04-031-1/+6
| | | | | | | | | | | | | | | commit 53f5094fdf5deacd99b8655df692e9278506724d upstream. The attribute writing should return the number of bytes used from the buffer on success. Fixes: a7cff92f0635 ("usb: typec: USB Power Delivery helpers for ports and partners") Cc: stable@vger.kernel.org Signed-off-by: Kyle Tso <kyletso@google.com> Reviewed-by: Guenter Roeck <linux@roeck-us.net> Reviewed-by: Heikki Krogerus <heikki.krogerus@linux.intel.com> Link: https://lore.kernel.org/r/20240319074309.3306579-1-kyletso@google.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* usb: typec: ucsi: Fix race between typec_switch and role_switchKrishna Kurapati2024-04-031-0/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit f5e9bda03aa50ffad36eccafe893d004ef213c43 upstream. When orientation switch is enabled in ucsi glink, there is a xhci probe failure seen when booting up in host mode in reverse orientation. During bootup the following things happen in multiple drivers: a) DWC3 controller driver initializes the core in device mode when the dr_mode is set to DRD. It relies on role_switch call to change role to host. b) QMP driver initializes the lanes to TYPEC_ORIENTATION_NORMAL as a normal routine. It relies on the typec_switch_set call to get notified of orientation changes. c) UCSI core reads the UCSI_GET_CONNECTOR_STATUS via the glink and provides initial role switch to dwc3 controller. When booting up in host mode with orientation TYPEC_ORIENTATION_REVERSE, then we see the following things happening in order: a) UCSI gives initial role as host to dwc3 controller ucsi_register_port. Upon receiving this notification, the dwc3 core needs to program GCTL from PRTCAP_DEVICE to PRTCAP_HOST and as part of this change, it asserts GCTL Core soft reset and waits for it to be completed before shifting it to host. Only after the reset is done will the dwc3_host_init be invoked and xhci is probed. DWC3 controller expects that the usb phy's are stable during this process i.e., the phy init is already done. b) During the 100ms wait for GCTL core soft reset, the actual notification from PPM is received by ucsi_glink via pmic glink for changing role to host. The pmic_glink_ucsi_notify routine first sends the orientation change to QMP and then sends role to dwc3 via ucsi framework. This is happening exactly at the time GCTL core soft reset is being processed. c) When QMP driver receives typec switch to TYPEC_ORIENTATION_REVERSE, it then re-programs the phy at the instant GCTL core soft reset has been asserted by dwc3 controller due to which the QMP PLL lock fails in qmp_combo_usb_power_on. d) After the 100ms of GCTL core soft reset is completed, the dwc3 core goes for initializing the host mode and invokes xhci probe. But at this point the QMP is non-responsive and as a result, the xhci plat probe fails during xhci_reset. Fix this by passing orientation switch to available ucsi instances if their gpio configuration is available before ucsi_register is invoked so that by the time, the pmic_glink_ucsi_notify provides typec_switch to QMP, the lane is already configured and the call would be a NOP thus not racing with role switch. Cc: stable@vger.kernel.org Fixes: c6165ed2f425 ("usb: ucsi: glink: use the connector orientation GPIO to provide switch events") Suggested-by: Wesley Cheng <quic_wcheng@quicinc.com> Signed-off-by: Krishna Kurapati <quic_kriskura@quicinc.com> Acked-by: Heikki Krogerus <heikki.krogerus@linux.intel.com> Link: https://lore.kernel.org/r/20240301040914.458492-1-quic_kriskura@quicinc.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* usb: udc: remove warning when queue disabled epyuan linyu2024-04-031-1/+3
| | | | | | | | | | | | | | | | | | | | | | | commit 2a587a035214fa1b5ef598aea0b81848c5b72e5e upstream. It is possible trigger below warning message from mass storage function, WARNING: CPU: 6 PID: 3839 at drivers/usb/gadget/udc/core.c:294 usb_ep_queue+0x7c/0x104 pc : usb_ep_queue+0x7c/0x104 lr : fsg_main_thread+0x494/0x1b3c Root cause is mass storage function try to queue request from main thread, but other thread may already disable ep when function disable. As there is no function failure in the driver, in order to avoid effort to fix warning, change WARN_ON_ONCE() in usb_ep_queue() to pr_debug(). Suggested-by: Alan Stern <stern@rowland.harvard.edu> Cc: stable@vger.kernel.org Signed-off-by: yuan linyu <yuanlinyu@hihonor.com> Reviewed-by: Alan Stern <stern@rowland.harvard.edu> Link: https://lore.kernel.org/r/20240315020144.2715575-1-yuanlinyu@hihonor.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* usb: dwc2: gadget: LPM flow fixMinas Harutyunyan2024-04-033-21/+47
| | | | | | | | | | | | | | | commit 5d69a3b54e5a630c90d82a4c2bdce3d53dc78710 upstream. Added functionality to exit from L1 state by device initiation using remote wakeup signaling, in case when function driver queuing request while core in L1 state. Fixes: 273d576c4d41 ("usb: dwc2: gadget: Add functionality to exit from LPM L1 state") Fixes: 88b02f2cb1e1 ("usb: dwc2: Add core state checking") CC: stable@vger.kernel.org Signed-off-by: Minas Harutyunyan <Minas.Harutyunyan@synopsys.com> Link: https://lore.kernel.org/r/b4d9de5382375dddbf7ef6049d9a82066ad87d5d.1710166393.git.Minas.Harutyunyan@synopsys.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* usb: dwc2: gadget: Fix exiting from clock gatingMinas Harutyunyan2024-04-034-5/+14
| | | | | | | | | | | | | | | | | commit 31f42da31417bec88158f3cf62d19db836217f1e upstream. Added exiting from the clock gating mode on USB Reset Detect interrupt if core in the clock gating mode. Added new condition to check core in clock gating mode or no. Fixes: 9b4965d77e11 ("usb: dwc2: Add exit clock gating from session request interrupt") Fixes: 5d240efddc7f ("usb: dwc2: Add exit clock gating from wakeup interrupt") Fixes: 16c729f90bdf ("usb: dwc2: Allow exit clock gating in urb enqueue") Fixes: 401411bbc4e6 ("usb: dwc2: Add exit clock gating before removing driver") CC: stable@vger.kernel.org Signed-off-by: Minas Harutyunyan <Minas.Harutyunyan@synopsys.com> Link: https://lore.kernel.org/r/cbcc2ccd37e89e339130797ed68ae4597db773ac.1708938774.git.Minas.Harutyunyan@synopsys.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* usb: dwc2: host: Fix ISOC flow in DDMA modeMinas Harutyunyan2024-04-033-9/+22
| | | | | | | | | | | | | | | | | | | commit b258e42688501cadb1a6dd658d6f015df9f32d8f upstream. Fixed ISOC completion flow in DDMA mode. Added isoc descriptor actual length value and update urb's start_frame value. Fixed initialization of ISOC DMA descriptors flow. Fixes: 56f5b1cff22a ("staging: Core files for the DWC2 driver") Fixes: 20f2eb9c4cf8 ("staging: dwc2: add microframe scheduler from downstream Pi kernel") Fixes: c17b337c1ea4 ("usb: dwc2: host: program descriptor for next frame") Fixes: dc4c76e7b22c ("staging: HCD descriptor DMA support for the DWC2 driver") Fixes: 762d3a1a9cd7 ("usb: dwc2: host: process all completed urbs") CC: stable@vger.kernel.org Signed-off-by: Minas Harutyunyan <Minas.Harutyunyan@synopsys.com> Link: https://lore.kernel.org/r/a8b1e1711cc6cabfb45d92ede12e35445c66f06c.1708944698.git.Minas.Harutyunyan@synopsys.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* usb: dwc2: host: Fix hibernation flowMinas Harutyunyan2024-04-032-2/+28
| | | | | | | | | | | | | | commit 3c7b9856a82227db01a20171d2e24c7ce305d59b upstream. Added to backup/restore registers HFLBADDR, HCCHARi, HCSPLTi, HCTSIZi, HCDMAi and HCDMABi. Fixes: 58e52ff6a6c3 ("usb: dwc2: Move register save and restore functions") Fixes: d17ee77b3044 ("usb: dwc2: add controller hibernation support") CC: stable@vger.kernel.org Signed-off-by: Minas Harutyunyan <Minas.Harutyunyan@synopsys.com> Link: https://lore.kernel.org/r/c2d10ee6098b9b009a8e94191e046004747d3bdd.1708945444.git.Minas.Harutyunyan@synopsys.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* usb: dwc2: host: Fix remote wakeup from hibernationMinas Harutyunyan2024-04-032-4/+14
| | | | | | | | | | | | | | commit bae2bc73a59c200db53b6c15fb26bb758e2c6108 upstream. Starting from core v4.30a changed order of programming GPWRDN_PMUACTV to 0 in case of exit from hibernation on remote wakeup signaling from device. Fixes: c5c403dc4336 ("usb: dwc2: Add host/device hibernation functions") CC: stable@vger.kernel.org Signed-off-by: Minas Harutyunyan <Minas.Harutyunyan@synopsys.com> Link: https://lore.kernel.org/r/99385ec55ce73445b6fbd0f471c9bd40eb1c9b9e.1708939799.git.Minas.Harutyunyan@synopsys.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* USB: core: Fix deadlock in port "disable" sysfs attributeAlan Stern2024-04-031-4/+34
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit f4d1960764d8a70318b02f15203a1be2b2554ca1 upstream. The show and store callback routines for the "disable" sysfs attribute file in port.c acquire the device lock for the port's parent hub device. This can cause problems if another process has locked the hub to remove it or change its configuration: Removing the hub or changing its configuration requires the hub interface to be removed, which requires the port device to be removed, and device_del() waits until all outstanding sysfs attribute callbacks for the ports have returned. The lock can't be released until then. But the disable_show() or disable_store() routine can't return until after it has acquired the lock. The resulting deadlock can be avoided by calling sysfs_break_active_protection(). This will cause the sysfs core not to wait for the attribute's callback routine to return, allowing the removal to proceed. The disadvantage is that after making this call, there is no guarantee that the hub structure won't be deallocated at any moment. To prevent this, we have to acquire a reference to it first by calling hub_get(). Signed-off-by: Alan Stern <stern@rowland.harvard.edu> Cc: stable <stable@kernel.org> Link: https://lore.kernel.org/r/f7a8c135-a495-4ce6-bd49-405a45e7ea9a@rowland.harvard.edu Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* USB: core: Add hub_get() and hub_put() routinesAlan Stern2024-04-032-7/+18
| | | | | | | | | | | | | commit ee113b860aa169e9a4d2c167c95d0f1961c6e1b8 upstream. Create hub_get() and hub_put() routines to encapsulate the kref_get() and kref_put() calls in hub.c. The new routines will be used by the next patch in this series. Signed-off-by: Alan Stern <stern@rowland.harvard.edu> Link: https://lore.kernel.org/r/604da420-ae8a-4a9e-91a4-2d511ff404fb@rowland.harvard.edu Cc: stable <stable@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* USB: core: Fix deadlock in usb_deauthorize_interface()Alan Stern2024-04-031-3/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit 80ba43e9f799cbdd83842fc27db667289b3150f5 upstream. Among the attribute file callback routines in drivers/usb/core/sysfs.c, the interface_authorized_store() function is the only one which acquires a device lock on an ancestor device: It calls usb_deauthorize_interface(), which locks the interface's parent USB device. The will lead to deadlock if another process already owns that lock and tries to remove the interface, whether through a configuration change or because the device has been disconnected. As part of the removal procedure, device_del() waits for all ongoing sysfs attribute callbacks to complete. But usb_deauthorize_interface() can't complete until the device lock has been released, and the lock won't be released until the removal has finished. The mechanism provided by sysfs to prevent this kind of deadlock is to use the sysfs_break_active_protection() function, which tells sysfs not to wait for the attribute callback. Reported-and-tested by: Yue Sun <samsun1006219@gmail.com> Reported by: xingwei lee <xrivendell7@gmail.com> Signed-off-by: Alan Stern <stern@rowland.harvard.edu> Link: https://lore.kernel.org/linux-usb/CAEkJfYO6jRVC8Tfrd_R=cjO0hguhrV31fDPrLrNOOHocDkPoAA@mail.gmail.com/#r Fixes: 310d2b4124c0 ("usb: interface authorization: SysFS part of USB interface authorization") Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/1c37eea1-9f56-4534-b9d8-b443438dc869@rowland.harvard.edu Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* usb: dwc3: Properly set system wakeupThinh Nguyen2024-04-034-0/+25
| | | | | | | | | | | | | | | | | | | | | | | | | | | | commit f9aa41130ac69d13a53ce2a153ca79c70d43f39c upstream. If the device is configured for system wakeup, then make sure that the xHCI driver knows about it and make sure to permit wakeup only at the appropriate time. For host mode, if the controller goes through the dwc3 code path, then a child xHCI platform device is created. Make sure the platform device also inherits the wakeup setting for xHCI to enable remote wakeup. For device mode, make sure to disable system wakeup if no gadget driver is bound. We may experience unwanted system wakeup due to the wakeup signal from the controller PMU detecting connection/disconnection when in low power (D3). E.g. In the case of Steam Deck, the PCI PME prevents the system staying in suspend. Cc: stable@vger.kernel.org Reported-by: Guilherme G. Piccoli <gpiccoli@igalia.com> Closes: https://lore.kernel.org/linux-usb/70a7692d-647c-9be7-00a6-06fc60f77294@igalia.com/T/#mf00d6669c2eff7b308d1162acd1d66c09f0853c7 Fixes: d07e8819a03d ("usb: dwc3: add xHCI Host support") Signed-off-by: Thinh Nguyen <Thinh.Nguyen@synopsys.com> Tested-by: Sanath S <Sanath.S@amd.com> Tested-by: Guilherme G. Piccoli <gpiccoli@igalia.com> # Steam Deck Link: https://lore.kernel.org/r/667cfda7009b502e08462c8fb3f65841d103cc0a.1709865476.git.Thinh.Nguyen@synopsys.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* staging: vc04_services: fix information leak in create_component()Dan Carpenter2024-04-031-0/+1
| | | | | | | | | | | | | | | commit f37e76abd614b68987abc8e5c22d986013349771 upstream. The m.u.component_create.pid field is for debugging and in the mainline kernel it's not used anything. However, it still needs to be set to something to prevent disclosing uninitialized stack data. Set it to zero. Fixes: 7b3ad5abf027 ("staging: Import the BCM2835 MMAL-based V4L2 camera driver.") Cc: stable <stable@kernel.org> Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org> Link: https://lore.kernel.org/r/2d972847-9ebd-481b-b6f9-af390f5aabd3@moroto.mountain Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* staging: vc04_services: changen strncpy() to strscpy_pad()Arnd Bergmann2024-04-031-2/+2
| | | | | | | | | | | | | | | | | | | | | commit ef25725b7f8aaffd7756974d3246ec44fae0a5cf upstream. gcc-14 warns about this strncpy() that results in a non-terminated string for an overflow: In file included from include/linux/string.h:369, from drivers/staging/vc04_services/vchiq-mmal/mmal-vchiq.c:20: In function 'strncpy', inlined from 'create_component' at drivers/staging/vc04_services/vchiq-mmal/mmal-vchiq.c:940:2: include/linux/fortify-string.h:108:33: error: '__builtin_strncpy' specified bound 128 equals destination size [-Werror=stringop-truncation] Change it to strscpy_pad(), which produces a properly terminated and zero-padded string. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Dan Carpenter <dan.carpenter@linaro.org> Link: https://lore.kernel.org/r/20240313163712.224585-1-arnd@kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* scsi: core: Fix unremoved procfs host directory regressionGuilherme G. Piccoli2024-04-031-3/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit f23a4d6e07570826fe95023ca1aa96a011fa9f84 upstream. Commit fc663711b944 ("scsi: core: Remove the /proc/scsi/${proc_name} directory earlier") fixed a bug related to modules loading/unloading, by adding a call to scsi_proc_hostdir_rm() on scsi_remove_host(). But that led to a potential duplicate call to the hostdir_rm() routine, since it's also called from scsi_host_dev_release(). That triggered a regression report, which was then fixed by commit be03df3d4bfe ("scsi: core: Fix a procfs host directory removal regression"). The fix just dropped the hostdir_rm() call from dev_release(). But it happens that this proc directory is created on scsi_host_alloc(), and that function "pairs" with scsi_host_dev_release(), while scsi_remove_host() pairs with scsi_add_host(). In other words, it seems the reason for removing the proc directory on dev_release() was meant to cover cases in which a SCSI host structure was allocated, but the call to scsi_add_host() didn't happen. And that pattern happens to exist in some error paths, for example. Syzkaller causes that by using USB raw gadget device, error'ing on usb-storage driver, at usb_stor_probe2(). By checking that path, we can see that the BadDevice label leads to a scsi_host_put() after a SCSI host allocation, but there's no call to scsi_add_host() in such path. That leads to messages like this in dmesg (and a leak of the SCSI host proc structure): usb-storage 4-1:87.51: USB Mass Storage device detected proc_dir_entry 'scsi/usb-storage' already registered WARNING: CPU: 1 PID: 3519 at fs/proc/generic.c:377 proc_register+0x347/0x4e0 fs/proc/generic.c:376 The proper fix seems to still call scsi_proc_hostdir_rm() on dev_release(), but guard that with the state check for SHOST_CREATED; there is even a comment in scsi_host_dev_release() detailing that: such conditional is meant for cases where the SCSI host was allocated but there was no calls to {add,remove}_host(), like the usb-storage case. This is what we propose here and with that, the error path of usb-storage does not trigger the warning anymore. Reported-by: syzbot+c645abf505ed21f931b5@syzkaller.appspotmail.com Fixes: be03df3d4bfe ("scsi: core: Fix a procfs host directory removal regression") Cc: stable@vger.kernel.org Cc: Bart Van Assche <bvanassche@acm.org> Cc: John Garry <john.g.garry@oracle.com> Cc: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com> Signed-off-by: Guilherme G. Piccoli <gpiccoli@igalia.com> Link: https://lore.kernel.org/r/20240313113006.2834799-1-gpiccoli@igalia.com Reviewed-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* scsi: sd: Fix TCG OPAL unlock on system resumeDamien Le Moal2024-04-037-5/+69
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit 0c76106cb97548810214def8ee22700bbbb90543 upstream. Commit 3cc2ffe5c16d ("scsi: sd: Differentiate system and runtime start/stop management") introduced the manage_system_start_stop scsi_device flag to allow libata to indicate to the SCSI disk driver that nothing should be done when resuming a disk on system resume. This change turned the execution of sd_resume() into a no-op for ATA devices on system resume. While this solved deadlock issues during device resume, this change also wrongly removed the execution of opal_unlock_from_suspend(). As a result, devices with TCG OPAL locking enabled remain locked and inaccessible after a system resume from sleep. To fix this issue, introduce the SCSI driver resume method and implement it with the sd_resume() function calling opal_unlock_from_suspend(). The former sd_resume() function is renamed to sd_resume_common() and modified to call the new sd_resume() function. For non-ATA devices, this result in no functional changes. In order for libata to explicitly execute sd_resume() when a device is resumed during system restart, the function scsi_resume_device() is introduced. libata calls this function from the revalidation work executed on devie resume, a state that is indicated with the new device flag ATA_DFLAG_RESUMING. Doing so, locked TCG OPAL enabled devices are unlocked on resume, allowing normal operation. Fixes: 3cc2ffe5c16d ("scsi: sd: Differentiate system and runtime start/stop management") Link: https://bugzilla.kernel.org/show_bug.cgi?id=218538 Cc: stable@vger.kernel.org Signed-off-by: Damien Le Moal <dlemoal@kernel.org> Link: https://lore.kernel.org/r/20240319071209.1179257-1-dlemoal@kernel.org Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* scsi: ufs: qcom: Provide default cycles_in_1us valueDmitry Baryshkov2024-04-031-2/+4
| | | | | | | | | | | | | | | | | | commit 81e2c1a0f8d3f62f4c9e80b20270aa3481c40524 upstream. The MSM8996 DT doesn't provide frequency limits for the core_clk_unipro clock, which results in miscalculation of the cycles_in_1us value. Provide the backwards-compatible default to support existing MSM8996 DT files. Fixes: b4e13e1ae95e ("scsi: ufs: qcom: Add multiple frequency support for MAX_CORE_CLK_1US_CYCLES") Cc: Nitin Rawat <quic_nitirawa@quicinc.com> Cc: stable@vger.kernel.org # 6.7.x Reviewed-by: Konrad Dybcio <konrad.dybcio@linaro.org> Reviewed-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org> Link: https://lore.kernel.org/r/20240218-msm8996-fix-ufs-v3-1-40aab49899a3@linaro.org Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* ALSA: sh: aica: reorder cleanup operations to avoid UAF bugsDuoming Zhou2024-04-031-3/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit 051e0840ffa8ab25554d6b14b62c9ab9e4901457 upstream. The dreamcastcard->timer could schedule the spu_dma_work and the spu_dma_work could also arm the dreamcastcard->timer. When the snd_pcm_substream is closing, the aica_channel will be deallocated. But it could still be dereferenced in the worker thread. The reason is that del_timer() will return directly regardless of whether the timer handler is running or not and the worker could be rescheduled in the timer handler. As a result, the UAF bug will happen. The racy situation is shown below: (Thread 1) | (Thread 2) snd_aicapcm_pcm_close() | ... | run_spu_dma() //worker | mod_timer() flush_work() | del_timer() | aica_period_elapsed() //timer kfree(dreamcastcard->channel) | schedule_work() | run_spu_dma() //worker ... | dreamcastcard->channel-> //USE In order to mitigate this bug and other possible corner cases, call mod_timer() conditionally in run_spu_dma(), then implement PCM sync_stop op to cancel both the timer and worker. The sync_stop op will be called from PCM core appropriately when needed. Fixes: 198de43d758c ("[ALSA] Add ALSA support for the SEGA Dreamcast PCM device") Suggested-by: Takashi Iwai <tiwai@suse.de> Signed-off-by: Duoming Zhou <duoming@zju.edu.cn> Message-ID: <20240326094238.95442-1-duoming@zju.edu.cn> Signed-off-by: Takashi Iwai <tiwai@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* vfio/pds: Make sure migration file isn't accessed after resetBrett Creeley2024-04-032-0/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit 457f7308254756b6e4b8fc3876cb770dcf0e7cc7 ] It's possible the migration file is accessed after reset when it has been cleaned up, especially when it's initiated by the device. This is because the driver doesn't rip out the filep when cleaning up it only frees the related page structures and sets its local struct pds_vfio_lm_file pointer to NULL. This can cause a NULL pointer dereference, which is shown in the example below during a restore after a device initiated reset: BUG: kernel NULL pointer dereference, address: 000000000000000c PF: supervisor read access in kernel mode PF: error_code(0x0000) - not-present page PGD 0 P4D 0 Oops: 0000 [#1] PREEMPT SMP NOPTI RIP: 0010:pds_vfio_get_file_page+0x5d/0xf0 [pds_vfio_pci] [...] Call Trace: <TASK> pds_vfio_restore_write+0xf6/0x160 [pds_vfio_pci] vfs_write+0xc9/0x3f0 ? __fget_light+0xc9/0x110 ksys_write+0xb5/0xf0 __x64_sys_write+0x1a/0x20 do_syscall_64+0x38/0x90 entry_SYSCALL_64_after_hwframe+0x63/0xcd [...] Add a disabled flag to the driver's struct pds_vfio_lm_file that gets set during cleanup. Then make sure to check the flag when the migration file is accessed via its file_operations. By default this flag will be false as the memory for struct pds_vfio_lm_file is kzalloc'd, which means the struct pds_vfio_lm_file is enabled and accessible. Also, since the file_operations and driver's migration file cleanup happen under the protection of the same pds_vfio_lm_file.lock, using this flag is thread safe. Fixes: 8512ed256334 ("vfio/pds: Always clear the save/restore FDs on reset") Reviewed-by: Shannon Nelson <shannon.nelson@amd.com> Signed-off-by: Brett Creeley <brett.creeley@amd.com> Link: https://lore.kernel.org/r/20240308182149.22036-2-brett.creeley@amd.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
* drm/amdgpu/display: Address kdoc for 'is_psr_su' in 'fill_dc_dirty_rects'Srinivasan Shanmugam2024-04-031-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit 3651306ae4c7f3f54caa9feb826a93cc69ccebbf ] The is_psr_su parameter is a boolean flag indicating whether the Panel Self Refresh Selective Update (PSR SU) feature is enabled which is a power-saving feature that allows only the updated regions of the screen to be refreshed, reducing the amount of data that needs to be sent to the display. Fixes the below with gcc W=1: drivers/gpu/drm/amd/amdgpu/../display/amdgpu_dm/amdgpu_dm.c:5257: warning: Function parameter or member 'is_psr_su' not described in 'fill_dc_dirty_rects' Fixes: d16df040c8da ("drm/amdgpu: make damage clips support configurable") Cc: stable@vger.kernel.org Cc: Hamza Mahfooz <hamza.mahfooz@amd.com> Cc: Mario Limonciello <mario.limonciello@amd.com> Cc: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com> Cc: Aurabindo Pillai <aurabindo.pillai@amd.com> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Reviewed-by: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
* drm/amdgpu: make damage clips support configurableHamza Mahfooz2024-04-033-0/+21
| | | | | | | | | | | | | | | [ Upstream commit fc184dbe9fd99ad2dfb197b6fe18768bae1774b1 ] We have observed that there are quite a number of PSR-SU panels on the market that are unable to keep up with what user space throws at them, resulting in hangs and random black screens. So, make damage clips support configurable and disable it by default for PSR-SU displays. Cc: stable@vger.kernel.org Reviewed-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Hamza Mahfooz <hamza.mahfooz@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
* drm/amd/display: set odm_combine_policy based on context in dcn32 resourceWenjing Liu2024-04-033-17/+34
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit 0a5fd7811a17af708cefdaab93af86838353002d ] [why] When populating dml pipes, odm combine policy should be assigned based on the pipe topology of the context passed in. DML pipes could be repopulated multiple times during single validate bandwidth attempt. We need to make sure that whenever we repopulate the dml pipes it is always aligned with the updated context. There is a case where DML pipes get repopulated during FPO optimization after ODM combine policy is changed. Since in the current code we reinitlaize ODM combine policy, even though the current context has ODM combine enabled, we overwrite it despite the pipes are already split. This causes DML to think that MPC combine is used so we mistakenly enable MPC combine because we apply pipe split with ODM combine policy reset. This issue doesn't impact non windowed MPO with ODM case because the legacy policy has restricted use cases. We don't encounter the case where both ODM and FPO optimizations are enabled together. So we decide to leave it as is because it is about to be replaced anyway. Cc: stable@vger.kernel.org # 6.6+ Reviewed-by: Chaitanya Dhere <chaitanya.dhere@amd.com> Reviewed-by: Alvin Lee <alvin.lee2@amd.com> Acked-by: Hamza Mahfooz <hamza.mahfooz@amd.com> Signed-off-by: Wenjing Liu <wenjing.liu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>