summaryrefslogtreecommitdiffstats
path: root/arch
Commit message (Collapse)AuthorAgeFilesLines
...
* KVM: nVMX: Emulate NOPs in L2, and PAUSE if it's not interceptedSean Christopherson2023-05-111-0/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit 4984563823f0034d3533854c1b50e729f5191089 upstream. Extend VMX's nested intercept logic for emulated instructions to handle "pause" interception, in quotes because KVM's emulator doesn't filter out NOPs when checking for nested intercepts. Failure to allow emulation of NOPs results in KVM injecting a #UD into L2 on any NOP that collides with the emulator's definition of PAUSE, i.e. on all single-byte NOPs. For PAUSE itself, honor L1's PAUSE-exiting control, but ignore PLE to avoid unnecessarily injecting a #UD into L2. Per the SDM, the first execution of PAUSE after VM-Entry is treated as the beginning of a new loop, i.e. will never trigger a PLE VM-Exit, and so L1 can't expect any given execution of PAUSE to deterministically exit. ... the processor considers this execution to be the first execution of PAUSE in a loop. (It also does so for the first execution of PAUSE at CPL 0 after VM entry.) All that said, the PLE side of things is currently a moot point, as KVM doesn't expose PLE to L1. Note, vmx_check_intercept() is still wildly broken when L1 wants to intercept an instruction, as KVM injects a #UD instead of synthesizing a nested VM-Exit. That issue extends far beyond NOP/PAUSE and needs far more effort to fix, i.e. is a problem for the future. Fixes: 07721feee46b ("KVM: nVMX: Don't emulate instructions in guest mode") Cc: Mathias Krause <minipli@grsecurity.net> Cc: stable@vger.kernel.org Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Link: https://lore.kernel.org/r/20230405002359.418138-1-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* MIPS: fw: Allow firmware to pass a empty envJiaxun Yang2023-05-111-1/+1
| | | | | | | | | | | | | | | | commit ee1809ed7bc456a72dc8410b475b73021a3a68d5 upstream. fw_getenv will use env entry to determine style of env, however it is legal for firmware to just pass a empty list. Check if first entry exist before running strchr to avoid null pointer dereference. Cc: stable@vger.kernel.org Link: https://github.com/clbr/n64bootloader/issues/5 Signed-off-by: Jiaxun Yang <jiaxun.yang@flygoat.com> Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* arm64: Stash shadow stack pointer in the task struct on interruptArd Biesheuvel2023-05-111-7/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | commit 59b37fe52f49955791a460752c37145f1afdcad1 upstream. Instead of reloading the shadow call stack pointer from the ordinary stack, which may be vulnerable to the kind of gadget based attacks shadow call stacks were designed to prevent, let's store a task's shadow call stack pointer in the task struct when switching to the shadow IRQ stack. Given that currently, the task_struct::scs_sp field is only used to preserve the shadow call stack pointer while a task is scheduled out or running in user space, reusing this field to preserve and restore it while running off the IRQ stack must be safe, as those occurrences are guaranteed to never overlap. (The stack switching logic only switches stacks when running from the task stack, and so the value being saved here always corresponds to the task mode shadow stack) While at it, fold a mov/add/mov sequence into a single add. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Reviewed-by: Kees Cook <keescook@chromium.org> Acked-by: Mark Rutland <mark.rutland@arm.com> Link: https://lore.kernel.org/r/20230109174800.3286265-3-ardb@kernel.org Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* arm64: Always load shadow stack pointer directly from the task structArd Biesheuvel2023-05-113-6/+7
| | | | | | | | | | | | | | | | | | | commit 2198d07c509f1db4a1185d1f65aaada794c6ea59 upstream. All occurrences of the scs_load macro load the value of the shadow call stack pointer from the task which is current at that point. So instead of taking a task struct register argument in the scs_load macro to specify the task struct to load from, let's always reference the current task directly. This should make it much harder to exploit any instruction sequences reloading the shadow call stack pointer register from memory. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Acked-by: Mark Rutland <mark.rutland@arm.com> Reviewed-by: Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/r/20230109174800.3286265-2-ardb@kernel.org Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* x86/cpu: Add model number for Intel Arrow Lake processorTony Luck2023-05-111-0/+2
| | | | | | | | | | | [ Upstream commit 81515ecf155a38f3532bf5ddef88d651898df6be ] Successor to Lunar Lake. Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/20230404174641.426593-1-tony.luck@intel.com Signed-off-by: Sasha Levin <sashal@kernel.org>
* x86/hyperv: Block root partition functionality in a Confidential VMMichael Kelley2023-05-111-4/+8
| | | | | | | | | | | | | | | | [ Upstream commit f8acb24aaf89fc46cd953229462ea8abe31b395f ] Hyper-V should never specify a VM that is a Confidential VM and also running in the root partition. Nonetheless, explicitly block such a combination to guard against a compromised Hyper-V maliciously trying to exploit root partition functionality in a Confidential VM to expose Confidential VM secrets. No known bug is being fixed, but the attack surface for Confidential VMs on Hyper-V is reduced. Signed-off-by: Michael Kelley <mikelley@microsoft.com> Link: https://lore.kernel.org/r/1678894453-95392-1-git-send-email-mikelley@microsoft.com Signed-off-by: Wei Liu <wei.liu@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
* riscv: No need to relocate the dtb as it lies in the fixmap regionAlexandre Ghiti2023-05-011-19/+2
| | | | | | | | | | | | | | | | | | | | commit 1b50f956c8fe9082bdee4a9cfd798149c52f7043 upstream. We used to access the dtb via its linear mapping address but now that the dtb early mapping was moved in the fixmap region, we can keep using this address since it is present in swapper_pg_dir, and remove the dtb relocation. Note that the relocation was wrong anyway since early_memremap() is restricted to 256K whereas the maximum fdt size is 2MB. Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com> Reviewed-by: Conor Dooley <conor.dooley@microchip.com> Tested-by: Conor Dooley <conor.dooley@microchip.com> Link: https://lore.kernel.org/r/20230329081932.79831-4-alexghiti@rivosinc.com Cc: stable@vger.kernel.org # 5.15.x Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* riscv: Do not set initial_boot_params to the linear address of the dtbAlexandre Ghiti2023-05-011-4/+1
| | | | | | | | | | | | | | commit f1581626071c8e37c58c5e8f0b4126b17172a211 upstream. early_init_dt_verify() is already called in parse_dtb() and since the dtb address does not change anymore (it is now in the fixmap region), no need to reset initial_boot_params by calling early_init_dt_verify() again. Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com> Link: https://lore.kernel.org/r/20230329081932.79831-3-alexghiti@rivosinc.com Cc: stable@vger.kernel.org # 5.15.x Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* riscv: Move early dtb mapping into the fixmap regionAlexandre Ghiti2023-05-014-16/+48
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit ef69d2559fe91f23d27a3d6fd640b5641787d22e upstream. riscv establishes 2 virtual mappings: - early_pg_dir maps the kernel which allows to discover the system memory - swapper_pg_dir installs the final mapping (linear mapping included) We used to map the dtb in early_pg_dir using DTB_EARLY_BASE_VA, and this mapping was not carried over in swapper_pg_dir. It happens that early_init_fdt_scan_reserved_mem() must be called before swapper_pg_dir is setup otherwise we could allocate reserved memory defined in the dtb. And this function initializes reserved_mem variable with addresses that lie in the early_pg_dir dtb mapping: when those addresses are reused with swapper_pg_dir, this mapping does not exist and then we trap. The previous "fix" was incorrect as early_init_fdt_scan_reserved_mem() must be called before swapper_pg_dir is set up otherwise we could allocate in reserved memory defined in the dtb. So move the dtb mapping in the fixmap region which is established in early_pg_dir and handed over to swapper_pg_dir. This patch had to be backported because: - the documentation for sv57 is not present here (as sv48/57 are not present) - handling of sv48/57 is not needed (as not present) Fixes: 922b0375fc93 ("riscv: Fix memblock reservation for device tree blob") Fixes: 8f3a2b4a96dc ("RISC-V: Move DT mapping outof fixmap") Fixes: 50e63dd8ed92 ("riscv: fix reserved memory setup") Reported-by: Conor Dooley <conor.dooley@microchip.com> Link: https://lore.kernel.org/all/f8e67f82-103d-156c-deb0-d6d6e2756f5e@microchip.com/ Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com> Reviewed-by: Conor Dooley <conor.dooley@microchip.com> Tested-by: Conor Dooley <conor.dooley@microchip.com> Link: https://lore.kernel.org/r/20230329081932.79831-2-alexghiti@rivosinc.com Cc: stable@vger.kernel.org # 5.15.x Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* KVM: arm64: Fix buffer overflow in kvm_arm_set_fw_reg()Dan Carpenter2023-05-011-0/+2
| | | | | | | | | | | | | | | | | | | commit a25bc8486f9c01c1af6b6c5657234b2eee2c39d6 upstream. The KVM_REG_SIZE() comes from the ioctl and it can be a power of two between 0-32768 but if it is more than sizeof(long) this will corrupt memory. Fixes: 99adb567632b ("KVM: arm/arm64: Add save/restore support for firmware workaround state") Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org> Reviewed-by: Steven Price <steven.price@arm.com> Reviewed-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/4efbab8c-640f-43b2-8ac6-6d68e08280fe@kili.mountain Signed-off-by: Oliver Upton <oliver.upton@linux.dev> [will: kvm_arm_set_fw_reg() lives in psci.c not hypercalls.c] Signed-off-by: Will Deacon <will@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* KVM: arm64: Retry fault if vma_lookup() results become invalidDavid Matlack2023-05-011-26/+21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit 13ec9308a85702af7c31f3638a2720863848a7f2 upstream. Read mmu_invalidate_seq before dropping the mmap_lock so that KVM can detect if the results of vma_lookup() (e.g. vma_shift) become stale before it acquires kvm->mmu_lock. This fixes a theoretical bug where a VMA could be changed by userspace after vma_lookup() and before KVM reads the mmu_invalidate_seq, causing KVM to install page table entries based on a (possibly) no-longer-valid vma_shift. Re-order the MMU cache top-up to earlier in user_mem_abort() so that it is not done after KVM has read mmu_invalidate_seq (i.e. so as to avoid inducing spurious fault retries). This bug has existed since KVM/ARM's inception. It's unlikely that any sane userspace currently modifies VMAs in such a way as to trigger this race. And even with directed testing I was unable to reproduce it. But a sufficiently motivated host userspace might be able to exploit this race. Fixes: 94f8e6418d39 ("KVM: ARM: Handle guest faults in KVM") Cc: stable@vger.kernel.org Reported-by: Sean Christopherson <seanjc@google.com> Signed-off-by: David Matlack <dmatlack@google.com> Reviewed-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20230313235454.2964067-1-dmatlack@google.com Signed-off-by: Oliver Upton <oliver.upton@linux.dev> [will: Use FSC_PERM instead of ESR_ELx_FSC_PERM. Read 'mmu_notifier_seq' instead of 'mmu_invalidate_seq'. Fix up function references in comment.] Signed-off-by: Will Deacon <will@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* purgatory: fix disabling debug infoAlyssa Ross2023-04-261-2/+1
| | | | | | | | | | | | | | | | | commit d83806c4c0cccc0d6d3c3581a11983a9c186a138 upstream. Since 32ef9e5054ec, -Wa,-gdwarf-2 is no longer used in KBUILD_AFLAGS. Instead, it includes -g, the appropriate -gdwarf-* flag, and also the -Wa versions of both of those if building with Clang and GNU as. As a result, debug info was being generated for the purgatory objects, even though the intention was that it not be. Fixes: 32ef9e5054ec ("Makefile.debug: re-enable debug info for .S files") Signed-off-by: Alyssa Ross <hi@alyssa.is> Cc: stable@vger.kernel.org Acked-by: Nick Desaulniers <ndesaulniers@google.com> Signed-off-by: Masahiro Yamada <masahiroy@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* MIPS: Define RUNTIME_DISCARD_EXIT in LD scriptJiaxun Yang2023-04-261-0/+2
| | | | | | | | | | | | | | | | commit 6dcbd0a69c84a8ae7a442840a8cf6b1379dc8f16 upstream. MIPS's exit sections are discarded at runtime as well. Fixes link error: `.exit.text' referenced in section `__jump_table' of fs/fuse/inode.o: defined in discarded section `.exit.text' of fs/fuse/inode.o Fixes: 99cb0d917ffa ("arch: fix broken BuildID for arm64 and riscv") Reported-by: "kernelci.org bot" <bot@kernelci.org> Signed-off-by: Jiaxun Yang <jiaxun.yang@flygoat.com> Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* s390/ptrace: fix PTRACE_GET_LAST_BREAK error handlingHeiko Carstens2023-04-261-6/+2
| | | | | | | | | | | | [ Upstream commit f9bbf25e7b2b74b52b2f269216a92657774f239c ] Return -EFAULT if put_user() for the PTRACE_GET_LAST_BREAK request fails, instead of silently ignoring it. Reviewed-by: Sven Schnelle <svens@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
* arm64: dts: imx8mm-evk: correct pmic clock sourcePeng Fan2023-04-261-1/+1
| | | | | | | | | | | | [ Upstream commit 85af7ffd24da38e416a14bd6bf207154d94faa83 ] The osc_32k supports #clock-cells as 0, using an id is wrong, drop it. Fixes: a6a355ede574 ("arm64: dts: imx8mm-evk: Add 32.768 kHz clock to PMIC") Signed-off-by: Peng Fan <peng.fan@nxp.com> Reviewed-by: Marco Felsch <m.felsch@pengutronix.de> Signed-off-by: Shawn Guo <shawnguo@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
* arm64: dts: meson-g12-common: specify full DMC rangeMarc Gonzalez2023-04-261-2/+1
| | | | | | | | | | | | | | | | | [ Upstream commit aec4353114a408b3a831a22ba34942d05943e462 ] According to S905X2 Datasheet - Revision 07: DRAM Memory Controller (DMC) register area spans ff638000-ff63a000. According to DeviceTree Specification - Release v0.4-rc1: simple-bus nodes do not require reg property. Fixes: 1499218c80c99a ("arm64: dts: move common G12A & G12B modes to meson-g12-common.dtsi") Signed-off-by: Marc Gonzalez <mgonzalez@freebox.fr> Reviewed-by: Martin Blumenstingl <martin.blumenstingl@googlemail.com> Link: https://lore.kernel.org/r/20230327120932.2158389-2-mgonzalez@freebox.fr Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
* arm64: dts: qcom: ipq8074-hk01: enable QMP device, not the PHY nodeDmitry Baryshkov2023-04-261-2/+2
| | | | | | | | | | | | | | | [ Upstream commit 72630ba422b70ea0874fc90d526353cf71c72488 ] Correct PCIe PHY enablement to refer the QMP device nodes rather than PHY device nodes. QMP nodes have 'status = "disabled"' property in the ipq8074.dtsi, while PHY nodes do not correspond to the actual device and do not have the status property. Fixes: e8a7fdc505bb ("arm64: dts: ipq8074: qcom: Re-arrange dts nodes based on address") Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org> Signed-off-by: Bjorn Andersson <andersson@kernel.org> Link: https://lore.kernel.org/r/20230324021651.1799969-1-dmitry.baryshkov@linaro.org Signed-off-by: Sasha Levin <sashal@kernel.org>
* ARM: dts: rockchip: fix a typo error for rk3288 spdif nodeJianqun Xu2023-04-261-1/+1
| | | | | | | | | | | | | [ Upstream commit 02c84f91adb9a64b75ec97d772675c02a3e65ed7 ] Fix the address in the spdif node name. Fixes: 874e568e500a ("ARM: dts: rockchip: Add SPDIF transceiver for RK3288") Signed-off-by: Jianqun Xu <jay.xu@rock-chips.com> Reviewed-by: Sjoerd Simons <sjoerd@collabora.com> Link: https://lore.kernel.org/r/20230208091411.1603142-1-jay.xu@rock-chips.com Signed-off-by: Heiko Stuebner <heiko@sntech.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
* x86/rtc: Remove __init for runtime functionsMatija Glavinic Pecotic2023-04-201-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit 775d3c514c5b2763a50ab7839026d7561795924d ] set_rtc_noop(), get_rtc_noop() are after booting, therefore their __init annotation is wrong. A crash was observed on an x86 platform where CMOS RTC is unused and disabled via device tree. set_rtc_noop() was invoked from ntp: sync_hw_clock(), although CONFIG_RTC_SYSTOHC=n, however sync_cmos_clock() doesn't honour that. Workqueue: events_power_efficient sync_hw_clock RIP: 0010:set_rtc_noop Call Trace: update_persistent_clock64 sync_hw_clock Fix this by dropping the __init annotation from set/get_rtc_noop(). Fixes: c311ed6183f4 ("x86/init: Allow DT configured systems to disable RTC at boot time") Signed-off-by: Matija Glavinic Pecotic <matija.glavinic-pecotic.ext@nokia.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Link: https://lore.kernel.org/r/59f7ceb1-446b-1d3d-0bc8-1f0ee94b1e18@nokia.com Signed-off-by: Sasha Levin <sashal@kernel.org>
* powerpc/papr_scm: Update the NUMA distance table for the target nodeAneesh Kumar K.V2023-04-202-0/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit b277fc793daf258877b4c0744b52f69d6e6ba22e ] Platform device helper routines won't update the NUMA distance table while creating a platform device, even if the device is present on a NUMA node that doesn't have memory or CPU. This is especially true for pmem devices. If the target node of the pmem device is not online, we find the nearest online node to the device and associate the pmem device with that online node. To find the nearest online node, we should have the numa distance table updated correctly. Update the distance information during the device probe. For a papr scm device on NUMA node 3 distance_lookup_table value for distance_ref_points_depth = 2 before and after fix is below: Before fix: node 3 distance depth 0 - 0 node 3 distance depth 1 - 0 node 4 distance depth 0 - 4 node 4 distance depth 1 - 2 node 5 distance depth 0 - 5 node 5 distance depth 1 - 1 After fix node 3 distance depth 0 - 3 node 3 distance depth 1 - 1 node 4 distance depth 0 - 4 node 4 distance depth 1 - 2 node 5 distance depth 0 - 5 node 5 distance depth 1 - 1 Without the fix, the nearest numa node to the pmem device (NUMA node 3) will be picked as 4. After the fix, we get the correct numa node which is 5. Fixes: da1115fdbd6e ("powerpc/nvdimm: Pick nearby online node if the device node is not online") Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20230404041433.1781804-1-aneesh.kumar@linux.ibm.com Signed-off-by: Sasha Levin <sashal@kernel.org>
* x86/PCI: Add quirk for AMD XHCI controller that loses MSI-X state in D3hotBasavaraj Natikar2023-04-201-0/+21
| | | | | | | | | | | | | | | | | | | | | | | commit f195fc1e9715ba826c3b62d58038f760f66a4fe9 upstream. The AMD [1022:15b8] USB controller loses some internal functional MSI-X context when transitioning from D0 to D3hot. BIOS normally traps D0->D3hot and D3hot->D0 transitions so it can save and restore that internal context, but some firmware in the field can't do this because it fails to clear the AMD_15B8_RCC_DEV2_EPF0_STRAP2 NO_SOFT_RESET bit. Clear AMD_15B8_RCC_DEV2_EPF0_STRAP2 NO_SOFT_RESET bit before USB controller initialization during boot. Link: https://lore.kernel.org/linux-usb/Y%2Fz9GdHjPyF2rNG3@glanzmann.de/T/#u Link: https://lore.kernel.org/r/20230329172859.699743-1-Basavaraj.Natikar@amd.com Reported-by: Thomas Glanzmann <thomas@glanzmann.de> Tested-by: Thomas Glanzmann <thomas@glanzmann.de> Signed-off-by: Basavaraj Natikar <Basavaraj.Natikar@amd.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Mario Limonciello <mario.limonciello@amd.com> Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* riscv: add icache flush for nommu sigreturn trampolineMathis Salmen2023-04-201-1/+8
| | | | | | | | | | | | | | | | | | commit 8d736482749f6d350892ef83a7a11d43cd49981e upstream. In a NOMMU kernel, sigreturn trampolines are generated on the user stack by setup_rt_frame. Currently, these trampolines are not instruction fenced, thus their visibility to ifetch is not guaranteed. This patch adds a flush_icache_range in setup_rt_frame to fix this problem. Signed-off-by: Mathis Salmen <mathis.salmen@matsal.de> Fixes: 6bd33e1ece52 ("riscv: add nommu support") Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20230406101130.82304-1-mathis.salmen@matsal.de Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* ARM: 9290/1: uaccess: Fix KASAN false-positivesAndrew Jeffery2023-04-201-2/+2
| | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit ceac10c83b330680cc01ceaaab86cd49f4f30d81 ] __copy_to_user_memcpy() and __clear_user_memset() had been calling memcpy() and memset() respectively, leading to false-positive KASAN reports when starting userspace: [ 10.707901] Run /init as init process [ 10.731892] process '/bin/busybox' started with executable stack [ 10.745234] ================================================================== [ 10.745796] BUG: KASAN: user-memory-access in __clear_user_memset+0x258/0x3ac [ 10.747260] Write of size 2687 at addr 000de581 by task init/1 Use __memcpy() and __memset() instead to allow userspace access, which is of course the intent of these functions. Signed-off-by: Andrew Jeffery <andrew@aj.id.au> Signed-off-by: Zev Weiss <zev@bewilderbeest.net> Reviewed-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Signed-off-by: Sasha Levin <sashal@kernel.org>
* KVM: arm64: PMU: Restore the guest's EL0 event counting after migrationReiji Watanabe2023-04-202-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | commit f9ea835e99bc8d049bf2a3ec8fa5a7cb4fcade23 upstream. Currently, with VHE, KVM enables the EL0 event counting for the guest on vcpu_load() or KVM enables it as a part of the PMU register emulation process, when needed. However, in the migration case (with VHE), the same handling is lacking, as vPMU register values that were restored by userspace haven't been propagated yet (the PMU events haven't been created) at the vcpu load-time on the first KVM_RUN (kvm_vcpu_pmu_restore_guest() called from vcpu_load() on the first KVM_RUN won't do anything as events_{guest,host} of kvm_pmu_events are still zero). So, with VHE, enable the guest's EL0 event counting on the first KVM_RUN (after the migration) when needed. More specifically, have kvm_pmu_handle_pmcr() call kvm_vcpu_pmu_restore_guest() so that kvm_pmu_handle_pmcr() on the first KVM_RUN can take care of it. Fixes: d0c94c49792c ("KVM: arm64: Restore PMU configuration on first run") Cc: stable@vger.kernel.org Reviewed-by: Marc Zyngier <maz@kernel.org> Signed-off-by: Reiji Watanabe <reijiw@google.com> Link: https://lore.kernel.org/r/20230329023944.2488484-1-reijiw@google.com Signed-off-by: Oliver Upton <oliver.upton@linux.dev> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* KVM: s390: pv: fix external interruption loop not always detectedNico Boehr2023-04-131-8/+24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit 21f27df854008b86349a203bf97fef79bb11f53e ] To determine whether the guest has caused an external interruption loop upon code 20 (external interrupt) intercepts, the ext_new_psw needs to be inspected to see whether external interrupts are enabled. Under non-PV, ext_new_psw can simply be taken from guest lowcore. Under PV, KVM can only access the encrypted guest lowcore and hence the ext_new_psw must not be taken from guest lowcore. handle_external_interrupt() incorrectly did that and hence was not able to reliably tell whether an external interruption loop is happening or not. False negatives cause spurious failures of my kvm-unit-test for extint loops[1] under PV. Since code 20 is only caused under PV if and only if the guest's ext_new_psw is enabled for external interrupts, false positive detection of a external interruption loop can not happen. Fix this issue by instead looking at the guest PSW in the state description. Since the PSW swap for external interrupt is done by the ultravisor before the intercept is caused, this reliably tells whether the guest is enabled for external interrupts in the ext_new_psw. Also update the comments to explain better what is happening. [1] https://lore.kernel.org/kvm/20220812062151.1980937-4-nrb@linux.ibm.com/ Signed-off-by: Nico Boehr <nrb@linux.ibm.com> Reviewed-by: Janosch Frank <frankja@linux.ibm.com> Reviewed-by: Christian Borntraeger <borntraeger@linux.ibm.com> Fixes: 201ae986ead7 ("KVM: s390: protvirt: Implement interrupt injection") Link: https://lore.kernel.org/r/20230213085520.100756-2-nrb@linux.ibm.com Message-Id: <20230213085520.100756-2-nrb@linux.ibm.com> Signed-off-by: Janosch Frank <frankja@linux.ibm.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
* x86/PVH: avoid 32-bit build warning when obtaining VGA console infoJan Beulich2023-04-051-1/+1
| | | | | | | | | | | | | | | | | commit aadbd07ff8a75ed342388846da78dfaddb8b106a upstream. In the commit referenced below I failed to pay attention to this code also being buildable as 32-bit. Adjust the type of "ret" - there's no real need for it to be wider than 32 bits. Fixes: 934ef33ee75c ("x86/PVH: obtain VGA console info in Dom0") Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Juergen Gross <jgross@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Link: https://lore.kernel.org/r/2d2193ff-670b-0a27-e12d-2c5c4c121c79@suse.com Signed-off-by: Juergen Gross <jgross@suse.com>
* KVM: x86: Purge "highest ISR" cache when updating APICv stateSean Christopherson2023-04-051-2/+1
| | | | | | | | | | | | | | | | | | | | | | | | | commit 97a71c444a147ae41c7d0ab5b3d855d7f762f3ed upstream. Purge the "highest ISR" cache when updating APICv state on a vCPU. The cache must not be used when APICv is active as hardware may emulate EOIs (and other operations) without exiting to KVM. This fixes a bug where KVM will effectively block IRQs in perpetuity due to the "highest ISR" never getting reset if APICv is activated on a vCPU while an IRQ is in-service. Hardware emulates the EOI and KVM never gets a chance to update its cache. Fixes: b26a695a1d78 ("kvm: lapic: Introduce APICv update helper function") Cc: stable@vger.kernel.org Cc: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Cc: Maxim Levitsky <mlevitsk@redhat.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20230106011306.85230-3-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Alejandro Jimenez <alejandro.j.jimenez@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* KVM: x86: Inject #GP on x2APIC WRMSR that sets reserved bits 63:32Sean Christopherson2023-04-051-0/+8
| | | | | | | | | | | | | | | | | | | | commit ab52be1b310bcb39e6745d34a8f0e8475d67381a upstream. Reject attempts to set bits 63:32 for 32-bit x2APIC registers, i.e. all x2APIC registers except ICR. Per Intel's SDM: Non-zero writes (by WRMSR instruction) to reserved bits to these registers will raise a general protection fault exception Opportunistically fix a typo in a nearby comment. Reported-by: Marc Orr <marcorr@google.com> Cc: stable@vger.kernel.org Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Link: https://lore.kernel.org/r/20230107011025.565472-3-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Alejandro Jimenez <alejandro.j.jimenez@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* KVM: VMX: Move preemption timer <=> hrtimer dance to common x86Sean Christopherson2023-04-052-6/+21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | commit 98c25ead5eda5e9d41abe57839ad3e8caf19500c upstream. Handle the switch to/from the hypervisor/software timer when a vCPU is blocking in common x86 instead of in VMX. Even though VMX is the only user of a hypervisor timer, the logic and all functions involved are generic x86 (unless future CPUs do something completely different and implement a hypervisor timer that runs regardless of mode). Handling the switch in common x86 will allow for the elimination of the pre/post_blocks hooks, and also lets KVM switch back to the hypervisor timer if and only if it was in use (without additional params). Add a comment explaining why the switch cannot be deferred to kvm_sched_out() or kvm_vcpu_block(). Signed-off-by: Sean Christopherson <seanjc@google.com> Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20211208015236.1616697-8-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> [ta: Fix conflicts in vmx_pre_block and vmx_post_block as per Paolo's suggestion. Add Reported-by and Link tags.] Reported-by: syzbot+b6a74be92b5063a0f1ff@syzkaller.appspotmail.com Link: https://syzkaller.appspot.com/bug?id=489beb3d76ef14cc6cd18125782dc6f86051a605 Tested-by: Tudor Ambarus <tudor.ambarus@linaro.org> Signed-off-by: Tudor Ambarus <tudor.ambarus@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* s390/uaccess: add missing earlyclobber annotations to __clear_user()Heiko Carstens2023-04-051-1/+1
| | | | | | | | | | | | | | | | | | commit 89aba4c26fae4e459f755a18912845c348ee48f3 upstream. Add missing earlyclobber annotation to size, to, and tmp2 operands of the __clear_user() inline assembly since they are modified or written to before the last usage of all input operands. This can lead to incorrect register allocation for the inline assembly. Fixes: 6c2a9e6df604 ("[S390] Use alternative user-copy operations for new hardware.") Reported-by: Mark Rutland <mark.rutland@arm.com> Link: https://lore.kernel.org/all/20230321122514.1743889-3-mark.rutland@arm.com/ Cc: stable@vger.kernel.org Reviewed-by: Gerald Schaefer <gerald.schaefer@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* KVM: arm64: Disable interrupts while walking userspace PTsMarc Zyngier2023-04-051-7/+38
| | | | | | | | | | | | | | | | | | | | | | | | commit e86fc1a3a3e9b4850fe74d738e3cfcf4297d8bba upstream. We walk the userspace PTs to discover what mapping size was used there. However, this can race against the userspace tables being freed, and we end-up in the weeds. Thankfully, the mm code is being generous and will IPI us when doing so. So let's implement our part of the bargain and disable interrupts around the walk. This ensures that nothing terrible happens during that time. We still need to handle the removal of the page tables before the walk. For that, allow get_user_mapping_size() to return an error, and make sure this error can be propagated all the way to the the exit handler. Signed-off-by: Marc Zyngier <maz@kernel.org> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20230316174546.3777507-2-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* xtensa: fix KASAN report for show_stackMax Filippov2023-04-051-4/+12
| | | | | | | | | | | | | commit 1d3b7a788ca7435156809a6bd5b20c95b2370d45 upstream. show_stack dumps raw stack contents which may trigger an unnecessary KASAN report. Fix it by copying stack contents to a temporary buffer with __memcpy and then printing that buffer instead of passing stack pointer directly to the print_hex_dump. Cc: stable@vger.kernel.org Signed-off-by: Max Filippov <jcmvbkbc@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* powerpc: Don't try to copy PPR for task with NULL pt_regsJens Axboe2023-04-051-0/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit fd7276189450110ed835eb0a334e62d2f1c4e3be upstream. powerpc sets up PF_KTHREAD and PF_IO_WORKER with a NULL pt_regs, which from my (arguably very short) checking is not commonly done for other archs. This is fine, except when PF_IO_WORKER's have been created and the task does something that causes a coredump to be generated. Then we get this crash: Kernel attempted to read user page (160) - exploit attempt? (uid: 1000) BUG: Kernel NULL pointer dereference on read at 0x00000160 Faulting instruction address: 0xc0000000000c3a60 Oops: Kernel access of bad area, sig: 11 [#1] LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=32 NUMA pSeries Modules linked in: bochs drm_vram_helper drm_kms_helper xts binfmt_misc ecb ctr syscopyarea sysfillrect cbc sysimgblt drm_ttm_helper aes_generic ttm sg libaes evdev joydev virtio_balloon vmx_crypto gf128mul drm dm_mod fuse loop configfs drm_panel_orientation_quirks ip_tables x_tables autofs4 hid_generic usbhid hid xhci_pci xhci_hcd usbcore usb_common sd_mod CPU: 1 PID: 1982 Comm: ppc-crash Not tainted 6.3.0-rc2+ #88 Hardware name: IBM pSeries (emulated by qemu) POWER9 (raw) 0x4e1202 0xf000005 of:SLOF,HEAD hv:linux,kvm pSeries NIP: c0000000000c3a60 LR: c000000000039944 CTR: c0000000000398e0 REGS: c0000000041833b0 TRAP: 0300 Not tainted (6.3.0-rc2+) MSR: 800000000280b033 <SF,VEC,VSX,EE,FP,ME,IR,DR,RI,LE> CR: 88082828 XER: 200400f8 ... NIP memcpy_power7+0x200/0x7d0 LR ppr_get+0x64/0xb0 Call Trace: ppr_get+0x40/0xb0 (unreliable) __regset_get+0x180/0x1f0 regset_get_alloc+0x64/0x90 elf_core_dump+0xb98/0x1b60 do_coredump+0x1c34/0x24a0 get_signal+0x71c/0x1410 do_notify_resume+0x140/0x6f0 interrupt_exit_user_prepare_main+0x29c/0x320 interrupt_exit_user_prepare+0x6c/0xa0 interrupt_return_srr_user+0x8/0x138 Because ppr_get() is trying to copy from a PF_IO_WORKER with a NULL pt_regs. Check for a valid pt_regs in both ppc_get/ppr_set, and return an error if not set. The actual error value doesn't seem to be important here, so just pick -EINVAL. Fixes: fa439810cc1b ("powerpc/ptrace: Enable support for NT_PPPC_TAR, NT_PPC_PPR, NT_PPC_DSCR") Cc: stable@vger.kernel.org # v4.8+ Signed-off-by: Jens Axboe <axboe@kernel.dk> [mpe: Trim oops in change log, add Fixes & Cc stable] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/d9f63344-fe7c-56ae-b420-4a1a04a2ae4c@kernel.dk Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* mips: bmips: BCM6358: disable RAC flush for TP1Álvaro Fernández Rojas2023-04-052-0/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit ab327f8acdf8d06601fbf058859a539a9422afff ] RAC flush causes kernel panics on BCM6358 with EHCI/OHCI when booting from TP1: [ 3.881739] usb 1-1: new high-speed USB device number 2 using ehci-platform [ 3.895011] Reserved instruction in kernel code[#1]: [ 3.900113] CPU: 0 PID: 1 Comm: init Not tainted 5.10.16 #0 [ 3.905829] $ 0 : 00000000 10008700 00000000 77d94060 [ 3.911238] $ 4 : 7fd1f088 00000000 81431cac 81431ca0 [ 3.916641] $ 8 : 00000000 ffffefff 8075cd34 00000000 [ 3.922043] $12 : 806f8d40 f3e812b7 00000000 000d9aaa [ 3.927446] $16 : 7fd1f068 7fd1f080 7ff559b8 81428470 [ 3.932848] $20 : 00000000 00000000 55590000 77d70000 [ 3.938251] $24 : 00000018 00000010 [ 3.943655] $28 : 81430000 81431e60 81431f28 800157fc [ 3.949058] Hi : 00000000 [ 3.952013] Lo : 00000000 [ 3.955019] epc : 80015808 setup_sigcontext+0x54/0x24c [ 3.960464] ra : 800157fc setup_sigcontext+0x48/0x24c [ 3.965913] Status: 10008703 KERNEL EXL IE [ 3.970216] Cause : 00800028 (ExcCode 0a) [ 3.974340] PrId : 0002a010 (Broadcom BMIPS4350) [ 3.979170] Modules linked in: ohci_platform ohci_hcd fsl_mph_dr_of ehci_platform ehci_fsl ehci_hcd gpio_button_hotplug usbcore nls_base usb_common [ 3.992907] Process init (pid: 1, threadinfo=(ptrval), task=(ptrval), tls=77e22ec8) [ 4.000776] Stack : 81431ef4 7fd1f080 81431f28 81428470 7fd1f068 81431edc 7ff559b8 81428470 [ 4.009467] 81431f28 7fd1f080 55590000 77d70000 77d5498c 80015c70 806f0000 8063ae74 [ 4.018149] 08100002 81431f28 0000000a 08100002 81431f28 0000000a 77d6b418 00000003 [ 4.026831] ffffffff 80016414 80080734 81431ecc 81431ecc 00000001 00000000 04000000 [ 4.035512] 77d54874 00000000 00000000 00000000 00000000 00000012 00000002 00000000 [ 4.044196] ... [ 4.046706] Call Trace: [ 4.049238] [<80015808>] setup_sigcontext+0x54/0x24c [ 4.054356] [<80015c70>] setup_frame+0xdc/0x124 [ 4.059015] [<80016414>] do_notify_resume+0x1dc/0x288 [ 4.064207] [<80011b50>] work_notifysig+0x10/0x18 [ 4.069036] [ 4.070538] Code: 8fc300b4 00001025 26240008 <ac820000> ac830004 3c048063 0c0228aa 24846a00 26240010 [ 4.080686] [ 4.082517] ---[ end trace 22a8edb41f5f983b ]--- [ 4.087374] Kernel panic - not syncing: Fatal exception [ 4.092753] Rebooting in 1 seconds.. Because the bootloader (CFE) is not initializing the Read-ahead cache properly on the second thread (TP1). Since the RAC was not initialized properly, we should avoid flushing it at the risk of corrupting the instruction stream as seen in the trace above. Fixes: d59098a0e9cb ("MIPS: bmips: use generic dma noncoherent ops") Signed-off-by: Álvaro Fernández Rojas <noltari@gmail.com> Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
* x86/PVH: obtain VGA console info in Dom0Jan Beulich2023-04-055-8/+22
| | | | | | | | | | | | | | | [ Upstream commit 934ef33ee75c3846f605f18b65048acd147e3918 ] A new platform-op was added to Xen to allow obtaining the same VGA console information PV Dom0 is handed. Invoke the new function and have the output data processed by xen_init_vga(). Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Juergen Gross <jgross@suse.com> Link: https://lore.kernel.org/r/8f315e92-7bda-c124-71cc-478ab9c5e610@suse.com Signed-off-by: Juergen Gross <jgross@suse.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
* riscv: Handle zicsr/zifencei issues between clang and binutilsNathan Chancellor2023-03-302-4/+28
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit e89c2e815e76471cb507bd95728bf26da7976430 upstream. There are two related issues that appear in certain combinations with clang and GNU binutils. The first occurs when a version of clang that supports zicsr or zifencei via '-march=' [1] (i.e, >= 17.x) is used in combination with a version of GNU binutils that do not recognize zicsr and zifencei in the '-march=' value (i.e., < 2.36): riscv64-linux-gnu-ld: -march=rv64i2p0_m2p0_a2p0_c2p0_zicsr2p0_zifencei2p0: Invalid or unknown z ISA extension: 'zifencei' riscv64-linux-gnu-ld: failed to merge target specific data of file fs/efivarfs/file.o riscv64-linux-gnu-ld: -march=rv64i2p0_m2p0_a2p0_c2p0_zicsr2p0_zifencei2p0: Invalid or unknown z ISA extension: 'zifencei' riscv64-linux-gnu-ld: failed to merge target specific data of file fs/efivarfs/super.o The second occurs when a version of clang that does not support zicsr or zifencei via '-march=' (i.e., <= 16.x) is used in combination with a version of GNU as that defaults to a newer ISA base spec, which requires specifying zicsr and zifencei in the '-march=' value explicitly (i.e, >= 2.38): ../arch/riscv/kernel/kexec_relocate.S: Assembler messages: ../arch/riscv/kernel/kexec_relocate.S:147: Error: unrecognized opcode `fence.i', extension `zifencei' required clang-12: error: assembler command failed with exit code 1 (use -v to see invocation) This is the same issue addressed by commit 6df2a016c0c8 ("riscv: fix build with binutils 2.38") (see [2] for additional information) but older versions of clang miss out on it because the cc-option check fails: clang-12: error: invalid arch name 'rv64imac_zicsr_zifencei', unsupported standard user-level extension 'zicsr' clang-12: error: invalid arch name 'rv64imac_zicsr_zifencei', unsupported standard user-level extension 'zicsr' To resolve the first issue, only attempt to add zicsr and zifencei to the march string when using the GNU assembler 2.38 or newer, which is when the default ISA spec was updated, requiring these extensions to be specified explicitly. LLVM implements an older version of the base specification for all currently released versions, so these instructions are available as part of the 'i' extension. If LLVM's implementation is updated in the future, a CONFIG_AS_IS_LLVM condition can be added to CONFIG_TOOLCHAIN_NEEDS_EXPLICIT_ZICSR_ZIFENCEI. To resolve the second issue, use version 2.2 of the base ISA spec when using an older version of clang that does not support zicsr or zifencei via '-march=', as that is the spec version most compatible with the one clang/LLVM implements and avoids the need to specify zicsr and zifencei explicitly due to still being a part of 'i'. [1]: https://github.com/llvm/llvm-project/commit/22e199e6afb1263c943c0c0d4498694e15bf8a16 [2]: https://lore.kernel.org/ZAxT7T9Xy1Fo3d5W@aurel32.net/ Cc: stable@vger.kernel.org Link: https://github.com/ClangBuiltLinux/linux/issues/1808 Co-developed-by: Conor Dooley <conor.dooley@microchip.com> Signed-off-by: Conor Dooley <conor.dooley@microchip.com> Signed-off-by: Nathan Chancellor <nathan@kernel.org> Acked-by: Conor Dooley <conor.dooley@microchip.com> Link: https://lore.kernel.org/r/20230313-riscv-zicsr-zifencei-fiasco-v1-1-dd1b7840a551@kernel.org Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* riscv: mm: Fix incorrect ASID argument when flushing TLBDylan Jhong2023-03-303-2/+4
| | | | | | | | | | | | | | | | commit 9a801afd3eb95e1a89aba17321062df06fb49d98 upstream. Currently, we pass the CONTEXTID instead of the ASID to the TLB flush function. We should only take the ASID field to prevent from touching the reserved bit field. Fixes: 3f1e782998cd ("riscv: add ASID-based tlbflushing methods") Signed-off-by: Dylan Jhong <dylan@andestech.com> Reviewed-by: Sergey Matyukevich <sergey.matyukevich@syntacore.com> Link: https://lore.kernel.org/r/20230313034906.2401730-1-dylan@andestech.com Cc: stable@vger.kernel.org Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* KVM: x86: hyper-v: Avoid calling kvm_make_vcpus_request_mask() with ↵Vitaly Kuznetsov2023-03-301-6/+9
| | | | | | | | | | | | | | | | | | | | | | | vcpu_mask==NULL commit 6470accc7ba948b0b3aca22b273fe84ec638a116 upstream. In preparation to making kvm_make_vcpus_request_mask() use for_each_set_bit() switch kvm_hv_flush_tlb() to calling kvm_make_all_cpus_request() for 'all cpus' case. Note: kvm_make_all_cpus_request() (unlike kvm_make_vcpus_request_mask()) currently dynamically allocates cpumask on each call and this is suboptimal. Both kvm_make_all_cpus_request() and kvm_make_vcpus_request_mask() are going to be switched to using pre-allocated per-cpu masks. Reviewed-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20210903075141.403071-4-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Cc: Mathias Krause <minipli@grsecurity.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* arm64: dts: imx8mm-nitrogen-r2: fix WM8960 clock nameKrzysztof Kozlowski2023-03-301-1/+1
| | | | | | | | | | | | | commit 32f86da7c86b27ebed31c24453a0713f612e43fb upstream. The WM8960 Linux driver expects the clock to be named "mclk". Otherwise the clock will be ignored and not prepared/enabled by the driver. Fixes: 40ba2eda0a7b ("arm64: dts: imx8mm-nitrogen-r2: add audio") Cc: <stable@vger.kernel.org> Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Signed-off-by: Shawn Guo <shawnguo@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* sh: sanitize the flags on sigreturnAl Viro2023-03-302-0/+4
| | | | | | | | | | | | | [ Upstream commit 573b22ccb7ce9ab7f0539a2e11a9d3609a8783f5 ] We fetch %SR value from sigframe; it might have been modified by signal handler, so we can't trust it with any bits that are not modifiable in user mode. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Cc: Rich Felker <dalias@libc.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
* m68k: Only force 030 bus error if PC not in exception tableMichael Schmitz2023-03-301-1/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit e36a82bebbf7da814530d5a179bef9df5934b717 ] __get_kernel_nofault() does copy data in supervisor mode when forcing a task backtrace log through /proc/sysrq_trigger. This is expected cause a bus error exception on e.g. NULL pointer dereferencing when logging a kernel task has no workqueue associated. This bus error ought to be ignored. Our 030 bus error handler is ill equipped to deal with this: Whenever ssw indicates a kernel mode access on a data fault, we don't even attempt to handle the fault and instead always send a SEGV signal (or panic). As a result, the check for exception handling at the fault PC (buried in send_sig_fault() which gets called from do_page_fault() eventually) is never used. In contrast, both 040 and 060 access error handlers do not care whether a fault happened on supervisor mode access, and will call do_page_fault() on those, ultimately honoring the exception table. Add a check in bus_error030 to call do_page_fault() in case we do have an entry for the fault PC in our exception table. I had attempted a fix for this earlier in 2019 that did rely on testing pagefault_disabled() (see link below) to achieve the same thing, but this patch should be more generic. Tested on 030 Atari Falcon. Reported-by: Eero Tamminen <oak@helsinkinet.fi> Link: https://lore.kernel.org/r/alpine.LNX.2.21.1904091023540.25@nippy.intranet Link: https://lore.kernel.org/r/63130691-1984-c423-c1f2-73bfd8d3dcd3@gmail.com Signed-off-by: Michael Schmitz <schmitzmic@gmail.com> Reviewed-by: Geert Uytterhoeven <geert@linux-m68k.org> Link: https://lore.kernel.org/r/20230301021107.26307-1-schmitzmic@gmail.com Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
* riscv: Bump COMMAND_LINE_SIZE value to 1024Alexandre Ghiti2023-03-301-0/+8
| | | | | | | | | | | | | | | | | | | | [ Upstream commit 61fc1ee8be26bc192d691932b0a67eabee45d12f ] Increase COMMAND_LINE_SIZE as the current default value is too low for syzbot kernel command line. There has been considerable discussion on this patch that has led to a larger patch set removing COMMAND_LINE_SIZE from the uapi headers on all ports. That's not quite done yet, but it's gotten far enough we're confident this is not a uABI change so this is safe. Reported-by: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: Alexandre Ghiti <alex@ghiti.fr> Link: https://lore.kernel.org/r/20210316193420.904-1-alex@ghiti.fr [Palmer: it's not uabi] Link: https://lore.kernel.org/linux-riscv/874b8076-b0d1-4aaa-bcd8-05d523060152@app.fastmail.com/#t Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
* arm64: dts: imx8mn: specify #sound-dai-cells for SAI nodesMarek Vasut2023-03-301-0/+5
| | | | | | | | | | | | | | [ Upstream commit 62fb54148cd6eb456ff031be8fb447c98cf0bd9b ] Add #sound-dai-cells properties to SAI nodes. Reviewed-by: Adam Ford <aford173@gmail.com> Reviewed-by: Fabio Estevam <festevam@gmail.com> Fixes: 9e9860069725 ("arm64: dts: imx8mn: Add SAI nodes") Signed-off-by: Marek Vasut <marex@denx.de> Reviewed-by: Marco Felsch <m.felsch@pengutronix.de> Signed-off-by: Shawn Guo <shawnguo@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
* ARM: dts: imx6sl: tolino-shine2hd: fix usbotg1 pinctrlPeng Fan2023-03-301-0/+1
| | | | | | | | | | | [ Upstream commit 1cd489e1ada1cffa56bd06fd4609f5a60a985d43 ] usb@2184000: 'pinctrl-0' is a dependency of 'pinctrl-names' Signed-off-by: Peng Fan <peng.fan@nxp.com> Fixes: 9c7016f1ca6d ("ARM: dts: imx: add devicetree for Tolino Shine 2 HD") Signed-off-by: Shawn Guo <shawnguo@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
* ARM: dts: imx6sll: e60k02: fix usbotg1 pinctrlPeng Fan2023-03-301-0/+1
| | | | | | | | | | | [ Upstream commit 957c04e9784c7c757e8cc293d7fb2a60cdf461b6 ] usb@2184000: 'pinctrl-0' is a dependency of 'pinctrl-names' Signed-off-by: Peng Fan <peng.fan@nxp.com> Fixes: c100ea86e6ab ("ARM: dts: add Netronix E60K02 board common file") Signed-off-by: Shawn Guo <shawnguo@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
* x86/resctrl: Clear staged_config[] before and after it is usedShawn Wang2023-03-223-9/+24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit 0424a7dfe9129b93f29b277511a60e87f052ac6b upstream. As a temporary storage, staged_config[] in rdt_domain should be cleared before and after it is used. The stale value in staged_config[] could cause an MSR access error. Here is a reproducer on a system with 16 usable CLOSIDs for a 15-way L3 Cache (MBA should be disabled if the number of CLOSIDs for MB is less than 16.) : mount -t resctrl resctrl -o cdp /sys/fs/resctrl mkdir /sys/fs/resctrl/p{1..7} umount /sys/fs/resctrl/ mount -t resctrl resctrl /sys/fs/resctrl mkdir /sys/fs/resctrl/p{1..8} An error occurs when creating resource group named p8: unchecked MSR access error: WRMSR to 0xca0 (tried to write 0x00000000000007ff) at rIP: 0xffffffff82249142 (cat_wrmsr+0x32/0x60) Call Trace: <IRQ> __flush_smp_call_function_queue+0x11d/0x170 __sysvec_call_function+0x24/0xd0 sysvec_call_function+0x89/0xc0 </IRQ> <TASK> asm_sysvec_call_function+0x16/0x20 When creating a new resource control group, hardware will be configured by the following process: rdtgroup_mkdir() rdtgroup_mkdir_ctrl_mon() rdtgroup_init_alloc() resctrl_arch_update_domains() resctrl_arch_update_domains() iterates and updates all resctrl_conf_type whose have_new_ctrl is true. Since staged_config[] holds the same values as when CDP was enabled, it will continue to update the CDP_CODE and CDP_DATA configurations. When group p8 is created, get_config_index() called in resctrl_arch_update_domains() will return 16 and 17 as the CLOSIDs for CDP_CODE and CDP_DATA, which will be translated to an invalid register - 0xca0 in this scenario. Fix it by clearing staged_config[] before and after it is used. [reinette: re-order commit tags] Fixes: 75408e43509e ("x86/resctrl: Allow different CODE/DATA configurations to be staged") Suggested-by: Xin Hao <xhao@linux.alibaba.com> Signed-off-by: Shawn Wang <shawnwang@linux.alibaba.com> Signed-off-by: Reinette Chatre <reinette.chatre@intel.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Tested-by: Reinette Chatre <reinette.chatre@intel.com> Cc:stable@vger.kernel.org Link: https://lore.kernel.org/all/2fad13f49fbe89687fc40e9a5a61f23a28d1507a.1673988935.git.reinette.chatre%40intel.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* x86/mm: Fix use of uninitialized buffer in sme_enable()Nikita Zhandarovich2023-03-221-1/+2
| | | | | | | | | | | | | | | | | | | | commit cbebd68f59f03633469f3ecf9bea99cd6cce3854 upstream. cmdline_find_option() may fail before doing any initialization of the buffer array. This may lead to unpredictable results when the same buffer is used later in calls to strncmp() function. Fix the issue by returning early if cmdline_find_option() returns an error. Found by Linux Verification Center (linuxtesting.org) with static analysis tool SVACE. Fixes: aca20d546214 ("x86/mm: Add support to make use of Secure Memory Encryption") Signed-off-by: Nikita Zhandarovich <n.zhandarovich@fintech.ru> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Acked-by: Tom Lendacky <thomas.lendacky@amd.com> Cc: <stable@kernel.org> Link: https://lore.kernel.org/r/20230306160656.14844-1-n.zhandarovich@fintech.ru Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* x86/mce: Make sure logged MCEs are processed after sysfs updateYazen Ghannam2023-03-221-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | commit 4783b9cb374af02d49740e00e2da19fd4ed6dec4 upstream. A recent change introduced a flag to queue up errors found during boot-time polling. These errors will be processed during late init once the MCE subsystem is fully set up. A number of sysfs updates call mce_restart() which goes through a subset of the CPU init flow. This includes polling MCA banks and logging any errors found. Since the same function is used as boot-time polling, errors will be queued. However, the system is now past late init, so the errors will remain queued until another error is found and the workqueue is triggered. Call mce_schedule_work() at the end of mce_restart() so that queued errors are processed. Fixes: 3bff147b187d ("x86/mce: Defer processing of early errors") Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Tony Luck <tony.luck@intel.com> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20230301221420.2203184-1-yazen.ghannam@amd.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* riscv: asid: Fixup stale TLB entry cause application crashGuo Ren2023-03-221-10/+20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit 82dd33fde0268cc622d3d1ac64971f3f61634142 upstream. After use_asid_allocator is enabled, the userspace application will crash by stale TLB entries. Because only using cpumask_clear_cpu without local_flush_tlb_all couldn't guarantee CPU's TLB entries were fresh. Then set_mm_asid would cause the user space application to get a stale value by stale TLB entry, but set_mm_noasid is okay. Here is the symptom of the bug: unhandled signal 11 code 0x1 (coredump) 0x0000003fd6d22524 <+4>: auipc s0,0x70 0x0000003fd6d22528 <+8>: ld s0,-148(s0) # 0x3fd6d92490 => 0x0000003fd6d2252c <+12>: ld a5,0(s0) (gdb) i r s0 s0 0x8082ed1cc3198b21 0x8082ed1cc3198b21 (gdb) x /2x 0x3fd6d92490 0x3fd6d92490: 0xd80ac8a8 0x0000003f The core dump file shows that register s0 is wrong, but the value in memory is correct. Because 'ld s0, -148(s0)' used a stale mapping entry in TLB and got a wrong result from an incorrect physical address. When the task ran on CPU0, which loaded/speculative-loaded the value of address(0x3fd6d92490), then the first version of the mapping entry was PTWed into CPU0's TLB. When the task switched from CPU0 to CPU1 (No local_tlb_flush_all here by asid), it happened to write a value on the address (0x3fd6d92490). It caused do_page_fault -> wp_page_copy -> ptep_clear_flush -> ptep_get_and_clear & flush_tlb_page. The flush_tlb_page used mm_cpumask(mm) to determine which CPUs need TLB flush, but CPU0 had cleared the CPU0's mm_cpumask in the previous switch_mm. So we only flushed the CPU1 TLB and set the second version mapping of the PTE. When the task switched from CPU1 to CPU0 again, CPU0 still used a stale TLB mapping entry which contained a wrong target physical address. It raised a bug when the task happened to read that value. CPU0 CPU1 - switch 'task' in - read addr (Fill stale mapping entry into TLB) - switch 'task' out (no tlb_flush) - switch 'task' in (no tlb_flush) - write addr cause pagefault do_page_fault() (change to new addr mapping) wp_page_copy() ptep_clear_flush() ptep_get_and_clear() & flush_tlb_page() write new value into addr - switch 'task' out (no tlb_flush) - switch 'task' in (no tlb_flush) - read addr again (Use stale mapping entry in TLB) get wrong value from old phyical addr, BUG! The solution is to keep all CPUs' footmarks of cpumask(mm) in switch_mm, which could guarantee to invalidate all stale TLB entries during TLB flush. Fixes: 65d4b9c53017 ("RISC-V: Implement ASID allocator") Signed-off-by: Guo Ren <guoren@linux.alibaba.com> Signed-off-by: Guo Ren <guoren@kernel.org> Tested-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com> Tested-by: Zong Li <zong.li@sifive.com> Tested-by: Sergey Matyukevich <sergey.matyukevich@syntacore.com> Cc: Anup Patel <apatel@ventanamicro.com> Cc: Palmer Dabbelt <palmer@rivosinc.com> Cc: stable@vger.kernel.org Reviewed-by: Andrew Jones <ajones@ventanamicro.com> Link: https://lore.kernel.org/r/20230226150137.1919750-3-geomatsi@gmail.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* Revert "riscv: mm: notify remote harts about mmu cache updates"Sergey Matyukevich2023-03-224-41/+17
| | | | | | | | | | | | | | | | | | | | commit e921050022f1f12d5029d1487a7dfc46cde15523 upstream. This reverts the remaining bits of commit 4bd1d80efb5a ("riscv: mm: notify remote harts harts about mmu cache updates"). According to bug reports, suggested approach to fix stale TLB entries is not sufficient. It needs to be replaced by a more robust solution. Fixes: 4bd1d80efb5a ("riscv: mm: notify remote harts about mmu cache updates") Reported-by: Zong Li <zong.li@sifive.com> Reported-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com> Signed-off-by: Sergey Matyukevich <sergey.matyukevich@syntacore.com> Cc: stable@vger.kernel.org Reviewed-by: Guo Ren <guoren@kernel.org> Link: https://lore.kernel.org/r/20230226150137.1919750-2-geomatsi@gmail.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>