summaryrefslogtreecommitdiffstats
path: root/arch/x86/kernel
Commit message (Collapse)AuthorAgeFilesLines
...
| * | | | | | | | x86/rdrand: Remove "nordrand" flag in favor of "random.trust_cpu"Jason A. Donenfeld2022-07-182-37/+22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The decision of whether or not to trust RDRAND is controlled by the "random.trust_cpu" boot time parameter or the CONFIG_RANDOM_TRUST_CPU compile time default. The "nordrand" flag was added during the early days of RDRAND, when there were worries that merely using its values could compromise the RNG. However, these days, RDRAND values are not used directly but always go through the RNG's hash function, making "nordrand" no longer useful. Rather, the correct switch is "random.trust_cpu", which not only handles the relevant trust issue directly, but also is general to multiple CPU types, not just x86. However, x86 RDRAND does have a history of being occasionally problematic. Prior, when the kernel would notice something strange, it'd warn in dmesg and suggest enabling "nordrand". We can improve on that by making the test a little bit better and then taking the step of automatically disabling RDRAND if we detect it's problematic. Also disable RDSEED if the RDRAND test fails. Cc: x86@kernel.org Cc: Theodore Ts'o <tytso@mit.edu> Suggested-by: H. Peter Anvin <hpa@zytor.com> Suggested-by: Borislav Petkov <bp@suse.de> Acked-by: Borislav Petkov <bp@suse.de> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
| * | | | | | | | random: remove CONFIG_ARCH_RANDOMJason A. Donenfeld2022-07-181-2/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When RDRAND was introduced, there was much discussion on whether it should be trusted and how the kernel should handle that. Initially, two mechanisms cropped up, CONFIG_ARCH_RANDOM, a compile time switch, and "nordrand", a boot-time switch. Later the thinking evolved. With a properly designed RNG, using RDRAND values alone won't harm anything, even if the outputs are malicious. Rather, the issue is whether those values are being *trusted* to be good or not. And so a new set of options were introduced as the real ones that people use -- CONFIG_RANDOM_TRUST_CPU and "random.trust_cpu". With these options, RDRAND is used, but it's not always credited. So in the worst case, it does nothing, and in the best case, maybe it helps. Along the way, CONFIG_ARCH_RANDOM's meaning got sort of pulled into the center and became something certain platforms force-select. The old options don't really help with much, and it's a bit odd to have special handling for these instructions when the kernel can deal fine with the existence or untrusted existence or broken existence or non-existence of that CPU capability. Simplify the situation by removing CONFIG_ARCH_RANDOM and using the ordinary asm-generic fallback pattern instead, keeping the two options that are actually used. For now it leaves "nordrand" for now, as the removal of that will take a different route. Acked-by: Michael Ellerman <mpe@ellerman.id.au> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Acked-by: Borislav Petkov <bp@suse.de> Acked-by: Heiko Carstens <hca@linux.ibm.com> Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
* | | | | | | | | Merge tag 'integrity-v6.0' of ↵Linus Torvalds2022-08-021-19/+1
|\ \ \ \ \ \ \ \ \ | |_|_|_|_|_|_|/ / |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/zohar/linux-integrity Pull integrity updates from Mimi Zohar: "Aside from the one EVM cleanup patch, all the other changes are kexec related. On different architectures different keyrings are used to verify the kexec'ed kernel image signature. Here are a number of preparatory cleanup patches and the patches themselves for making the keyrings - builtin_trusted_keyring, .machine, .secondary_trusted_keyring, and .platform - consistent across the different architectures" * tag 'integrity-v6.0' of git://git.kernel.org/pub/scm/linux/kernel/git/zohar/linux-integrity: kexec, KEYS, s390: Make use of built-in and secondary keyring for signature verification arm64: kexec_file: use more system keyrings to verify kernel image signature kexec, KEYS: make the code in bzImage64_verify_sig generic kexec: clean up arch_kexec_kernel_verify_sig kexec: drop weak attribute from functions kexec_file: drop weak attribute from functions evm: Use IS_ENABLED to initialize .enabled
| * | | | | | | | kexec, KEYS: make the code in bzImage64_verify_sig genericCoiby Xu2022-07-151-19/+1
| | |_|/ / / / / | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit 278311e417be ("kexec, KEYS: Make use of platform keyring for signature verify") adds platform keyring support on x86 kexec but not arm64. The code in bzImage64_verify_sig uses the keys on the .builtin_trusted_keys, .machine, if configured and enabled, .secondary_trusted_keys, also if configured, and .platform keyrings to verify the signed kernel image as PE file. Cc: kexec@lists.infradead.org Cc: keyrings@vger.kernel.org Cc: linux-security-module@vger.kernel.org Reviewed-by: Michal Suchanek <msuchanek@suse.de> Signed-off-by: Coiby Xu <coxu@redhat.com> Signed-off-by: Mimi Zohar <zohar@linux.ibm.com>
* | | | | | | | Merge tag 'locking-core-2022-08-01' of ↵Linus Torvalds2022-08-012-16/+0
|\ \ \ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull locking updates from Ingo Molnar: "This was a fairly quiet cycle for the locking subsystem: - lockdep: Fix a handful of the more complex lockdep_init_map_*() primitives that can lose the lock_type & cause false reports. No such mishap was observed in the wild. - jump_label improvements: simplify the cross-arch support of initial NOP patching by making it arch-specific code (used on MIPS only), and remove the s390 initial NOP patching that was superfluous" * tag 'locking-core-2022-08-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: locking/lockdep: Fix lockdep_init_map_*() confusion jump_label: make initial NOP patching the special case jump_label: mips: move module NOP patching into arch code jump_label: s390: avoid pointless initial NOP patching
| * | | | | | | | jump_label: make initial NOP patching the special caseArd Biesheuvel2022-06-241-13/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Instead of defaulting to patching NOP opcodes at init time, and leaving it to the architectures to override this if this is not needed, switch to a model where doing nothing is the default. This is the common case by far, as only MIPS requires NOP patching at init time. On all other architectures, the correct encodings are emitted by the compiler and so no initial patching is needed. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Mark Rutland <mark.rutland@arm.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20220615154142.1574619-4-ardb@kernel.org
| * | | | | | | | jump_label: mips: move module NOP patching into arch codeArd Biesheuvel2022-06-241-3/+0
| | |_|_|/ / / / | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | MIPS is the only remaining architecture that needs to patch jump label NOP encodings to initialize them at load time. So let's move the module patching part of that from generic code into arch/mips, and drop it from the others. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20220615154142.1574619-3-ardb@kernel.org
* | | | | | | | Merge tag 'x86_kdump_for_v6.0_rc1' of ↵Linus Torvalds2022-08-013-6/+147
|\ \ \ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 kdump updates from Borislav Petkov: - Add the ability to pass early an RNG seed to the kernel from the boot loader - Add the ability to pass the IMA measurement of kernel and bootloader to the kexec-ed kernel * tag 'x86_kdump_for_v6.0_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/setup: Use rng seeds from setup_data x86/kexec: Carry forward IMA measurement log on kexec
| * | | | | | | | x86/setup: Use rng seeds from setup_dataJason A. Donenfeld2022-07-112-4/+44
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently, the only way x86 can get an early boot RNG seed is via EFI, which is generally always used now for physical machines, but is very rarely used in VMs, especially VMs that are optimized for starting "instantaneously", such as Firecracker's MicroVM. For tiny fast booting VMs, EFI is not something you generally need or want. Rather, the image loader or firmware should be able to pass a single random seed, exactly as device tree platforms do with the "rng-seed" property. Additionally, this is something that bootloaders can append, with their own seed file management, which is something every other major OS ecosystem has that Linux does not (yet). Add SETUP_RNG_SEED, similar to the other eight setup_data entries that are parsed at boot. It also takes care to zero out the seed immediately after using, in order to retain forward secrecy. This all takes about 7 trivial lines of code. Then, on kexec_file_load(), a new fresh seed is generated and passed to the next kernel, just as is done on device tree architectures when using kexec. And, importantly, I've tested that QEMU is able to properly pass SETUP_RNG_SEED as well, making this work for every step of the way. This code too is pretty straight forward. Together these measures ensure that VMs and nested kexec()'d kernels always receive a proper boot time RNG seed at the earliest possible stage from their parents: - Host [already has strongly initialized RNG] - QEMU [passes fresh seed in SETUP_RNG_SEED field] - Linux [uses parent's seed and gathers entropy of its own] - kexec [passes this in SETUP_RNG_SEED field] - Linux [uses parent's seed and gathers entropy of its own] - kexec [passes this in SETUP_RNG_SEED field] - Linux [uses parent's seed and gathers entropy of its own] - kexec [passes this in SETUP_RNG_SEED field] - ... I've verified in several scenarios that this works quite well from a host kernel to QEMU and down inwards, mixing and matching loaders, with every layer providing a seed to the next. [ bp: Massage commit message. ] Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: Borislav Petkov <bp@suse.de> Acked-by: H. Peter Anvin (Intel) <hpa@zytor.com> Link: https://lore.kernel.org/r/20220630113300.1892799-1-Jason@zx2c4.com
| * | | | | | | | Merge tag 'v5.19-rc6' into tip:x86/kdumpBorislav Petkov2022-07-113-2/+14
| |\ \ \ \ \ \ \ \ | | | |_|_|_|_|/ / | | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Merge rc6 to pick up dependent changes to the bootparam UAPI header. Signed-off-by: Borislav Petkov <bp@suse.de>
| * | | | | | | | x86/kexec: Carry forward IMA measurement log on kexecJonathan McDowell2022-07-013-5/+106
| | |/ / / / / / | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | On kexec file load, the Integrity Measurement Architecture (IMA) subsystem may verify the IMA signature of the kernel and initramfs, and measure it. The command line parameters passed to the kernel in the kexec call may also be measured by IMA. A remote attestation service can verify a TPM quote based on the TPM event log, the IMA measurement list and the TPM PCR data. This can be achieved only if the IMA measurement log is carried over from the current kernel to the next kernel across the kexec call. PowerPC and ARM64 both achieve this using device tree with a "linux,ima-kexec-buffer" node. x86 platforms generally don't make use of device tree, so use the setup_data mechanism to pass the IMA buffer to the new kernel. Signed-off-by: Jonathan McDowell <noodles@fb.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Mimi Zohar <zohar@linux.ibm.com> # IMA function definitions Link: https://lore.kernel.org/r/YmKyvlF3my1yWTvK@noodles-fedora-PC23Y6EG
* | | | | | | | Merge tag 'x86_build_for_v6.0_rc1' of ↵Linus Torvalds2022-08-011-2/+0
|\ \ \ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 build updates from Borislav Petkov: - Fix stack protector builds when cross compiling with Clang - Other Kbuild improvements and fixes * tag 'x86_build_for_v6.0_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/purgatory: Omit use of bin2c x86/purgatory: Hard-code obj-y in Makefile x86/build: Remove unused OBJECT_FILES_NON_STANDARD_test_nx.o x86/Kconfig: Fix CONFIG_CC_HAS_SANE_STACKPROTECTOR when cross compiling with clang
| * | | | | | | | x86/build: Remove unused OBJECT_FILES_NON_STANDARD_test_nx.oMasahiro Yamada2022-07-111-2/+0
| | |/ / / / / / | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit 3ad38ceb2769 ("x86/mm: Remove CONFIG_DEBUG_NX_TEST") removed arch/x86/kernel/test_nx.c Signed-off-by: Masahiro Yamada <masahiroy@kernel.org> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lore.kernel.org/r/20220711041247.119357-1-masahiroy@kernel.org
* | | | | | | | Merge tag 'x86_misc_for_v6.0_rc1' of ↵Linus Torvalds2022-08-012-1/+19
|\ \ \ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull misc x86 updates from Borislav Petkov: - Add a bunch of PCI IDs for new AMD CPUs and use them in k10temp - Free the pmem platform device on the registration error path * tag 'x86_misc_for_v6.0_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: hwmon: (k10temp): Add support for new family 17h and 19h models x86/amd_nb: Add AMD PCI IDs for SMN communication x86/pmem: Fix platform-device leak in error path
| * | | | | | | | x86/amd_nb: Add AMD PCI IDs for SMN communicationMario Limonciello2022-07-201-0/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add support for SMN communication on family 17h model A0h and family 19h models 60h-70h. [ bp: Merge into a single patch. ] Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Yazen Ghannam <yazen.ghannam@amd.com> Acked-by: Bjorn Helgaas <bhelgaas@google.com> # pci_ids.h Acked-by: Guenter Roeck <linux@roeck-us.net> Link: https://lore.kernel.org/r/20220719195256.1516-1-mario.limonciello@amd.com
| * | | | | | | | x86/pmem: Fix platform-device leak in error pathJohan Hovold2022-06-201-1/+6
| | |/ / / / / / | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Make sure to free the platform device in the unlikely event that registration fails. Fixes: 7a67832c7e44 ("libnvdimm, e820: make CONFIG_X86_PMEM_LEGACY a tristate option") Signed-off-by: Johan Hovold <johan@kernel.org> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lore.kernel.org/r/20220620140723.9810-1-johan@kernel.org
* | | | | | | | Merge tag 'x86_cpu_for_v6.0_rc1' of ↵Linus Torvalds2022-08-012-15/+33
|\ \ \ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 cpu updates from Borislav Petkov: - Remove the vendor check when selecting MWAIT as the default idle state - Respect idle=nomwait when supplied on the kernel cmdline - Two small cleanups * tag 'x86_cpu_for_v6.0_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/cpu: Use MSR_IA32_MISC_ENABLE constants x86: Fix comment for X86_FEATURE_ZEN x86: Remove vendor checks from prefer_mwait_c1_over_halt x86: Handle idle=nomwait cmdline properly for x86_idle
| * | | | | | | | x86/cpu: Use MSR_IA32_MISC_ENABLE constantsPaolo Bonzini2022-07-191-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Instead of the magic numbers 1<<11 and 1<<12 use the constants from msr-index.h. This makes it obvious where those bits of MSR_IA32_MISC_ENABLE are consumed (and in fact that Linux consumes them at all) to simple minds that grep for MSR_IA32_MISC_ENABLE_.*_UNAVAIL. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lore.kernel.org/r/20220719174714.2410374-1-pbonzini@redhat.com
| * | | | | | | | x86: Remove vendor checks from prefer_mwait_c1_over_haltWyes Karny2022-06-081-10/+25
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Remove vendor checks from prefer_mwait_c1_over_halt function. Restore the decision tree to support MWAIT C1 as the default idle state based on CPUID checks as done by Thomas Gleixner in commit 09fd4b4ef5bc ("x86: use cpuid to check MWAIT support for C1") The decision tree is removed in commit 69fb3676df33 ("x86 idle: remove mwait_idle() and "idle=mwait" cmdline param") Prefer MWAIT when the following conditions are satisfied: 1. CPUID_Fn00000001_ECX [Monitor] should be set 2. CPUID_Fn00000005 should be supported 3. If CPUID_Fn00000005_ECX [EMX] is set then there should be at least one C1 substate available, indicated by CPUID_Fn00000005_EDX [MWaitC1SubStates] bits. Otherwise use HLT for default_idle function. HPC customers who want to optimize for lower latency are known to disable Global C-States in the BIOS. In fact, some vendors allow choosing a BIOS 'performance' profile which explicitly disables C-States. In this scenario, the cpuidle driver will not be loaded and the kernel will continue with the default idle state chosen at boot time. On AMD systems currently the default idle state is HLT which has a higher exit latency compared to MWAIT. The reason for the choice of HLT over MWAIT on AMD systems is: 1. Families prior to 10h didn't support MWAIT 2. Families 10h-15h supported MWAIT, but not MWAIT C1. Hence it was preferable to use HLT as the default state on these systems. However, AMD Family 17h onwards supports MWAIT as well as MWAIT C1. And it is preferable to use MWAIT as the default idle state on these systems, as it has lower exit latencies. The below table represents the exit latency for HLT and MWAIT on AMD Zen 3 system. Exit latency is measured by issuing a wakeup (IPI) to other CPU and measuring how many clock cycles it took to wakeup. Each iteration measures 10K wakeups by pinning source and destination. HLT: 25.0000th percentile : 1900 ns 50.0000th percentile : 2000 ns 75.0000th percentile : 2300 ns 90.0000th percentile : 2500 ns 95.0000th percentile : 2600 ns 99.0000th percentile : 2800 ns 99.5000th percentile : 3000 ns 99.9000th percentile : 3400 ns 99.9500th percentile : 3600 ns 99.9900th percentile : 5900 ns Min latency : 1700 ns Max latency : 5900 ns Total Samples 9999 MWAIT: 25.0000th percentile : 1400 ns 50.0000th percentile : 1500 ns 75.0000th percentile : 1700 ns 90.0000th percentile : 1800 ns 95.0000th percentile : 1900 ns 99.0000th percentile : 2300 ns 99.5000th percentile : 2500 ns 99.9000th percentile : 3200 ns 99.9500th percentile : 3500 ns 99.9900th percentile : 4600 ns Min latency : 1200 ns Max latency : 4600 ns Total Samples 9997 Improvement (99th percentile): 21.74% Below is another result for context_switch2 micro-benchmark, which brings out the impact of improved wakeup latency through increased context-switches per second. with HLT: ------------------------------- 50.0000th percentile : 190184 75.0000th percentile : 191032 90.0000th percentile : 192314 95.0000th percentile : 192520 99.0000th percentile : 192844 MIN : 190148 MAX : 192852 with MWAIT: ------------------------------- 50.0000th percentile : 277444 75.0000th percentile : 278268 90.0000th percentile : 278888 95.0000th percentile : 279164 99.0000th percentile : 280504 MIN : 273278 MAX : 281410 Improvement(99th percentile): ~ 45.46% Signed-off-by: Wyes Karny <wyes.karny@amd.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Tested-by: Zhang Rui <rui.zhang@intel.com> Link: https://ozlabs.org/~anton/junkcode/context_switch2.c Link: https://lkml.kernel.org/r/0cc675d8fd1f55e41b510e10abf2e21b6e9803d5.1654538381.git-series.wyes.karny@amd.com
| * | | | | | | | x86: Handle idle=nomwait cmdline properly for x86_idleWyes Karny2022-06-081-3/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When kernel is booted with idle=nomwait do not use MWAIT as the default idle state. If the user boots the kernel with idle=nomwait, it is a clear direction to not use mwait as the default idle state. However, the current code does not take this into consideration while selecting the default idle state on x86. Fix it by checking for the idle=nomwait boot option in prefer_mwait_c1_over_halt(). Also update the documentation around idle=nomwait appropriately. [ dhansen: tweak commit message ] Signed-off-by: Wyes Karny <wyes.karny@amd.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Tested-by: Zhang Rui <rui.zhang@intel.com> Link: https://lkml.kernel.org/r/fdc2dc2d0a1bc21c2f53d989ea2d2ee3ccbc0dbe.1654538381.git-series.wyes.karny@amd.com
* | | | | | | | | Merge tag 'x86_fpu_for_v6.0_rc1' of ↵Linus Torvalds2022-08-011-0/+14
|\ \ \ \ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fpu update from Borislav Petkov: - Add machinery to initialize AMX register state in order for AMX-capable CPUs to be able to enter deeper low-power state * tag 'x86_fpu_for_v6.0_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: intel_idle: Add a new flag to initialize the AMX state x86/fpu: Add a helper to prepare AMX state for low-power CPU idle
| * | | | | | | | | x86/fpu: Add a helper to prepare AMX state for low-power CPU idleChang S. Bae2022-07-191-0/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When a CPU enters an idle state, a non-initialized AMX register state may be the cause of preventing a deeper low-power state. Other extended register states whether initialized or not do not impact the CPU idle state. The new helper can ensure the AMX state is initialized before the CPU is idle, and it will be used by the intel idle driver. Check the AMX_TILE feature bit before using XGETBV1 as a chain of dependencies was established via cpuid_deps[]: AMX->XFD->XGETBV1. Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20220608164748.11864-2-chang.seok.bae@intel.com
* | | | | | | | | | Merge tag 'x86_vmware_for_v6.0_rc1' of ↵Linus Torvalds2022-08-011-2/+2
|\ \ \ \ \ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 vmware cleanup from Borislav Petkov: - A single statement simplification by using the BIT() macro * tag 'x86_vmware_for_v6.0_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/vmware: Use BIT() macro for shifting
| * | | | | | | | | | x86/vmware: Use BIT() macro for shiftingShreenidhi Shedi2022-06-221-2/+2
| | |_|/ / / / / / / | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | VMWARE_CMD_VCPU_RESERVED is bit 31 and that would mean undefined behavior when shifting an int but the kernel is built with -fno-strict-overflow which will wrap around using two's complement. Use the BIT() macro to improve readability and avoid any potential overflow confusion because it uses an unsigned long. [ bp: Clarify commit message. ] Signed-off-by: Shreenidhi Shedi <sshedi@vmware.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Srivatsa S. Bhat (VMware) <srivatsa@csail.mit.edu> Link: https://lore.kernel.org/r/20220601101820.535031-1-sshedi@vmware.com
* | | | | | | | | | Merge tag 'ras_core_for_v6.0_rc1' of ↵Linus Torvalds2022-08-012-1/+48
|\ \ \ \ \ \ \ \ \ \ | |_|_|_|_|_|_|_|_|/ |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull RAS update from Borislav Petkov: "A single RAS change: - Probe whether hardware error injection (direct MSR writes) is possible when injecting errors on AMD platforms. In some cases, the platform could prohibit those" * tag 'ras_core_for_v6.0_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/mce: Check whether writes to MCA_STATUS are getting ignored
| * | | | | | | | | x86/mce: Check whether writes to MCA_STATUS are getting ignoredSmita Koralahalli2022-06-282-1/+48
| |/ / / / / / / / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The platform can sometimes - depending on its settings - cause writes to MCA_STATUS MSRs to get ignored, regardless of HWCR[McStatusWrEn]'s value. For further info see PPR for AMD Family 19h, Model 01h, Revision B1 Processors, doc ID 55898 at https://bugzilla.kernel.org/show_bug.cgi?id=206537. Therefore, probe for ignored writes to MCA_STATUS to determine if hardware error injection is at all possible. [ bp: Heavily massage commit message and patch. ] Signed-off-by: Smita Koralahalli <Smita.KoralahalliChannabasappa@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lore.kernel.org/r/20220214233640.70510-2-Smita.KoralahalliChannabasappa@amd.com
* | | | | | | | | x86/bugs: Do not enable IBPB at firmware entry when IBPB is not availableThadeu Lima de Souza Cascardo2022-07-291-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Some cloud hypervisors do not provide IBPB on very recent CPU processors, including AMD processors affected by Retbleed. Using IBPB before firmware calls on such systems would cause a GPF at boot like the one below. Do not enable such calls when IBPB support is not present. EFI Variables Facility v0.08 2004-May-17 general protection fault, maybe for address 0x1: 0000 [#1] PREEMPT SMP NOPTI CPU: 0 PID: 24 Comm: kworker/u2:1 Not tainted 5.19.0-rc8+ #7 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 0.0.0 02/06/2015 Workqueue: efi_rts_wq efi_call_rts RIP: 0010:efi_call_rts Code: e8 37 33 58 ff 41 bf 48 00 00 00 49 89 c0 44 89 f9 48 83 c8 01 4c 89 c2 48 c1 ea 20 66 90 b9 49 00 00 00 b8 01 00 00 00 31 d2 <0f> 30 e8 7b 9f 5d ff e8 f6 f8 ff ff 4c 89 f1 4c 89 ea 4c 89 e6 48 RSP: 0018:ffffb373800d7e38 EFLAGS: 00010246 RAX: 0000000000000001 RBX: 0000000000000006 RCX: 0000000000000049 RDX: 0000000000000000 RSI: ffff94fbc19d8fe0 RDI: ffff94fbc1b2b300 RBP: ffffb373800d7e70 R08: 0000000000000000 R09: 0000000000000000 R10: 000000000000000b R11: 000000000000000b R12: ffffb3738001fd78 R13: ffff94fbc2fcfc00 R14: ffffb3738001fd80 R15: 0000000000000048 FS: 0000000000000000(0000) GS:ffff94fc3da00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: ffff94fc30201000 CR3: 000000006f610000 CR4: 00000000000406f0 Call Trace: <TASK> ? __wake_up process_one_work worker_thread ? rescuer_thread kthread ? kthread_complete_and_exit ret_from_fork </TASK> Modules linked in: Fixes: 28a99e95f55c ("x86/amd: Use IBPB for firmware calls") Reported-by: Dimitri John Ledkov <dimitri.ledkov@canonical.com> Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: <stable@vger.kernel.org> Link: https://lore.kernel.org/r/20220728122602.2500509-1-cascardo@canonical.com
* | | | | | | | | Revert "x86/sev: Expose sev_es_ghcb_hv_call() for use by HyperV"Borislav Petkov2022-07-272-25/+17
| |_|_|_|_|_|/ / |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This reverts commit 007faec014cb5d26983c1f86fd08c6539b41392e. Now that hyperv does its own protocol negotiation: 49d6a3c062a1 ("x86/Hyper-V: Add SEV negotiate protocol support in Isolation VM") revert this exposure of the sev_es_ghcb_hv_call() helper. Cc: Wei Liu <wei.liu@kernel.org> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by:Tianyu Lan <tiala@microsoft.com> Link: https://lore.kernel.org/r/20220614014553.1915929-1-ltykernel@gmail.com
* | | | | | | | x86/bugs: Warn when "ibrs" mitigation is selected on Enhanced IBRS partsPawan Gupta2022-07-201-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | IBRS mitigation for spectre_v2 forces write to MSR_IA32_SPEC_CTRL at every kernel entry/exit. On Enhanced IBRS parts setting MSR_IA32_SPEC_CTRL[IBRS] only once at boot is sufficient. MSR writes at every kernel entry/exit incur unnecessary performance loss. When Enhanced IBRS feature is present, print a warning about this unnecessary performance loss. Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/2a5eaf54583c2bfe0edc4fea64006656256cca17.1657814857.git.pawan.kumar.gupta@linux.intel.com
* | | | | | | | x86/alternative: Report missing return thunk detailsKees Cook2022-07-201-1/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Debugging missing return thunks is easier if we can see where they're happening. Suggested-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/lkml/Ys66hwtFcGbYmoiZ@hirez.programming.kicks-ass.net/
* | | | | | | | x86/amd: Use IBPB for firmware callsPeter Zijlstra2022-07-181-1/+10
| |/ / / / / / |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | On AMD IBRS does not prevent Retbleed; as such use IBPB before a firmware call to flush the branch history state. And because in order to do an EFI call, the kernel maps a whole lot of the kernel page table into the EFI page table, do an IBPB just in case in order to prevent the scenario of poisoning the BTB and causing an EFI call using the unprotected RET there. [ bp: Massage. ] Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lore.kernel.org/r/20220715194550.793957-1-cascardo@canonical.com
* | | | | | | Merge tag 'x86_urgent_for_v5.19_rc7' of ↵Linus Torvalds2022-07-172-2/+3
|\ \ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Borislav Petkov: - Improve the check whether the kernel supports WP mappings so that it can accomodate a XenPV guest due to how the latter is setting up the PAT machinery - Now that the retbleed nightmare is public, here's the first round of fallout fixes: * Fix a build failure on 32-bit due to missing include * Remove an untraining point in espfix64 return path * other small cleanups * tag 'x86_urgent_for_v5.19_rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/bugs: Remove apostrophe typo um: Add missing apply_returns() x86/entry: Remove UNTRAIN_RET from native_irq_return_ldt x86/bugs: Mark retbleed_strings static x86/pat: Fix x86_has_pat_wp() x86/asm/32: Fix ANNOTATE_UNRET_SAFE use on 32-bit
| * | | | | | | x86/bugs: Remove apostrophe typoKim Phillips2022-07-161-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Remove a superfluous ' in the mitigation string. Fixes: e8ec1b6e08a2 ("x86/bugs: Enable STIBP for JMP2RET") Signed-off-by: Kim Phillips <kim.phillips@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de>
| * | | | | | | x86/bugs: Mark retbleed_strings staticJiapeng Chong2022-07-141-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This symbol is not used outside of bugs.c, so mark it static. Reported-by: Abaci Robot <abaci@linux.alibaba.com> Signed-off-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lore.kernel.org/r/20220714072939.71162-1-jiapeng.chong@linux.alibaba.com
| * | | | | | | x86/asm/32: Fix ANNOTATE_UNRET_SAFE use on 32-bitJiri Slaby2022-07-131-0/+1
| | |_|_|_|/ / | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The build on x86_32 currently fails after commit 9bb2ec608a20 (objtool: Update Retpoline validation) with: arch/x86/kernel/../../x86/xen/xen-head.S:35: Error: no such instruction: `annotate_unret_safe' ANNOTATE_UNRET_SAFE is defined in nospec-branch.h. And head_32.S is missing this include. Fix this. Fixes: 9bb2ec608a20 ("objtool: Update Retpoline validation") Signed-off-by: Jiri Slaby <jslaby@suse.cz> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lore.kernel.org/r/63e23f80-033f-f64e-7522-2816debbc367@kernel.org
* | | | | | | Merge tag 'acpi-5.19-rc7' of ↵Linus Torvalds2022-07-161-0/+6
|\ \ \ \ \ \ \ | |/ / / / / / |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull ACPI fix from Rafael Wysocki: "Fix more fallout from recent changes of the ACPI CPPC handling on AMD platforms (Mario Limonciello)" * tag 'acpi-5.19-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: ACPI: CPPC: Fix enabling CPPC on AMD systems with shared memory
| * | | | | | ACPI: CPPC: Fix enabling CPPC on AMD systems with shared memoryMario Limonciello2022-07-131-0/+6
| | |_|/ / / | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When commit 72f2ecb7ece7 ("ACPI: bus: Set CPPC _OSC bits for all and when CPPC_LIB is supported") was introduced, we found collateral damage that a number of AMD systems that supported CPPC but didn't advertise support in _OSC stopped having a functional amd-pstate driver. The _OSC was only enforced on Intel systems at that time. This was fixed for the MSR based designs by commit 8b356e536e69f ("ACPI: CPPC: Don't require _OSC if X86_FEATURE_CPPC is supported") but some shared memory based designs also support CPPC but haven't advertised support in the _OSC. Add support for those designs as well by hardcoding the list of systems. Fixes: 72f2ecb7ece7 ("ACPI: bus: Set CPPC _OSC bits for all and when CPPC_LIB is supported") Fixes: 8b356e536e69f ("ACPI: CPPC: Don't require _OSC if X86_FEATURE_CPPC is supported") Link: https://lore.kernel.org/all/3559249.JlDtxWtqDm@natalenko.name/ Cc: 5.18+ <stable@vger.kernel.org> # 5.18+ Reported-and-tested-by: Oleksandr Natalenko <oleksandr@natalenko.name> Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
* | | | | | Merge tag 'x86_bugs_retbleed' of ↵Linus Torvalds2022-07-121-5/+8
|\ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull lockdep fix for x86 retbleed from Borislav Petkov: - Fix lockdep complaint for __static_call_fixup() * tag 'x86_bugs_retbleed' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/static_call: Serialize __static_call_fixup() properly
| * | | | | | x86/static_call: Serialize __static_call_fixup() properlyThomas Gleixner2022-07-121-5/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | __static_call_fixup() invokes __static_call_transform() without holding text_mutex, which causes lockdep to complain in text_poke_bp(). Adding the proper locking cures that, but as this is either used during early boot or during module finalizing, it's not required to use text_poke_bp(). Add an argument to __static_call_transform() which tells it to use text_poke_early() for it. Fixes: ee88d363d156 ("x86,static_call: Use alternative RET encoding") Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Borislav Petkov <bp@suse.de>
* | | | | | | Merge tag 'x86_bugs_retbleed' of ↵Linus Torvalds2022-07-1116-117/+668
|\ \ \ \ \ \ \ | |_|/ / / / / |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 retbleed fixes from Borislav Petkov: "Just when you thought that all the speculation bugs were addressed and solved and the nightmare is complete, here's the next one: speculating after RET instructions and leaking privileged information using the now pretty much classical covert channels. It is called RETBleed and the mitigation effort and controlling functionality has been modelled similar to what already existing mitigations provide" * tag 'x86_bugs_retbleed' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (54 commits) x86/speculation: Disable RRSBA behavior x86/kexec: Disable RET on kexec x86/bugs: Do not enable IBPB-on-entry when IBPB is not supported x86/entry: Move PUSH_AND_CLEAR_REGS() back into error_entry x86/bugs: Add Cannon lake to RETBleed affected CPU list x86/retbleed: Add fine grained Kconfig knobs x86/cpu/amd: Enumerate BTC_NO x86/common: Stamp out the stepping madness KVM: VMX: Prevent RSB underflow before vmenter x86/speculation: Fill RSB on vmexit for IBRS KVM: VMX: Fix IBRS handling after vmexit KVM: VMX: Prevent guest RSB poisoning attacks with eIBRS KVM: VMX: Convert launched argument to flags KVM: VMX: Flatten __vmx_vcpu_run() objtool: Re-add UNWIND_HINT_{SAVE_RESTORE} x86/speculation: Remove x86_spec_ctrl_mask x86/speculation: Use cached host SPEC_CTRL value for guest entry/exit x86/speculation: Fix SPEC_CTRL write on SMT state change x86/speculation: Fix firmware entry SPEC_CTRL handling x86/speculation: Fix RSB filling with CONFIG_RETPOLINE=n ...
| * | | | | | x86/speculation: Disable RRSBA behaviorPawan Gupta2022-07-092-0/+27
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Some Intel processors may use alternate predictors for RETs on RSB-underflow. This condition may be vulnerable to Branch History Injection (BHI) and intramode-BTI. Kernel earlier added spectre_v2 mitigation modes (eIBRS+Retpolines, eIBRS+LFENCE, Retpolines) which protect indirect CALLs and JMPs against such attacks. However, on RSB-underflow, RET target prediction may fallback to alternate predictors. As a result, RET's predicted target may get influenced by branch history. A new MSR_IA32_SPEC_CTRL bit (RRSBA_DIS_S) controls this fallback behavior when in kernel mode. When set, RETs will not take predictions from alternate predictors, hence mitigating RETs as well. Support for this is enumerated by CPUID.7.2.EDX[RRSBA_CTRL] (bit2). For spectre v2 mitigation, when a user selects a mitigation that protects indirect CALLs and JMPs against BHI and intramode-BTI, set RRSBA_DIS_S also to protect RETs for RSB-underflow case. Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com> Signed-off-by: Borislav Petkov <bp@suse.de>
| * | | | | | x86/kexec: Disable RET on kexecKonrad Rzeszutek Wilk2022-07-092-12/+36
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | All the invocations unroll to __x86_return_thunk and this file must be PIC independent. This fixes kexec on 64-bit AMD boxes. [ bp: Fix 32-bit build. ] Reported-by: Edward Tran <edward.tran@oracle.com> Reported-by: Awais Tanveer <awais.tanveer@oracle.com> Suggested-by: Ankur Arora <ankur.a.arora@oracle.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: Alexandre Chartre <alexandre.chartre@oracle.com> Signed-off-by: Borislav Petkov <bp@suse.de>
| * | | | | | x86/bugs: Do not enable IBPB-on-entry when IBPB is not supportedThadeu Lima de Souza Cascardo2022-07-081-2/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There are some VM configurations which have Skylake model but do not support IBPB. In those cases, when using retbleed=ibpb, userspace is going to be killed and kernel is going to panic. If the CPU does not support IBPB, warn and proceed with the auto option. Also, do not fallback to IBPB on AMD/Hygon systems if it is not supported. Fixes: 3ebc17006888 ("x86/bugs: Add retbleed=ibpb") Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com> Signed-off-by: Borislav Petkov <bp@suse.de>
| * | | | | | x86/bugs: Add Cannon lake to RETBleed affected CPU listPawan Gupta2022-07-071-0/+1
| |/ / / / / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Cannon lake is also affected by RETBleed, add it to the list. Fixes: 6ad0ad2bf8a6 ("x86/bugs: Report Intel retbleed vulnerability") Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com> Signed-off-by: Borislav Petkov <bp@suse.de>
| * | | | | x86/retbleed: Add fine grained Kconfig knobsPeter Zijlstra2022-06-294-16/+35
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Do fine-grained Kconfig for all the various retbleed parts. NOTE: if your compiler doesn't support return thunks this will silently 'upgrade' your mitigation to IBPB, you might not like this. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Borislav Petkov <bp@suse.de>
| * | | | | x86/cpu/amd: Enumerate BTC_NOAndrew Cooper2022-06-272-8/+19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | BTC_NO indicates that hardware is not susceptible to Branch Type Confusion. Zen3 CPUs don't suffer BTC. Hypervisors are expected to synthesise BTC_NO when it is appropriate given the migration pool, to prevent kernels using heuristics. [ bp: Massage. ] Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Signed-off-by: Borislav Petkov <bp@suse.de>
| * | | | | x86/common: Stamp out the stepping madnessPeter Zijlstra2022-06-271-21/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The whole MMIO/RETBLEED enumeration went overboard on steppings. Get rid of all that and simply use ANY. If a future stepping of these models would not be affected, it had better set the relevant ARCH_CAP_$FOO_NO bit in IA32_ARCH_CAPABILITIES. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Borislav Petkov <bp@suse.de> Acked-by: Dave Hansen <dave.hansen@linux.intel.com> Signed-off-by: Borislav Petkov <bp@suse.de>
| * | | | | KVM: VMX: Prevent RSB underflow before vmenterJosh Poimboeuf2022-06-272-2/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | On VMX, there are some balanced returns between the time the guest's SPEC_CTRL value is written, and the vmenter. Balanced returns (matched by a preceding call) are usually ok, but it's at least theoretically possible an NMI with a deep call stack could empty the RSB before one of the returns. For maximum paranoia, don't allow *any* returns (balanced or otherwise) between the SPEC_CTRL write and the vmenter. [ bp: Fix 32-bit build. ] Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Borislav Petkov <bp@suse.de>
| * | | | | x86/speculation: Fill RSB on vmexit for IBRSJosh Poimboeuf2022-06-271-5/+58
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Prevent RSB underflow/poisoning attacks with RSB. While at it, add a bunch of comments to attempt to document the current state of tribal knowledge about RSB attacks and what exactly is being mitigated. Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Borislav Petkov <bp@suse.de>
| * | | | | KVM: VMX: Prevent guest RSB poisoning attacks with eIBRSJosh Poimboeuf2022-06-271-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | On eIBRS systems, the returns in the vmexit return path from __vmx_vcpu_run() to vmx_vcpu_run() are exposed to RSB poisoning attacks. Fix that by moving the post-vmexit spec_ctrl handling to immediately after the vmexit. Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Borislav Petkov <bp@suse.de>