summaryrefslogtreecommitdiffstats
path: root/arch/x86
Commit message (Collapse)AuthorAgeFilesLines
* Merge tag 'x86_urgent_for_v5.19_rc6' of ↵Linus Torvalds2022-07-107-14/+28
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Borislav Petkov: - Prepare for and clear .brk early in order to address XenPV guests failures where the hypervisor verifies page tables and uninitialized data in that range leads to bogus failures in those checks - Add any potential setup_data entries supplied at boot to the identity pagetable mappings to prevent kexec kernel boot failures. Usually, this is not a problem for the normal kernel as those mappings are part of the initially mapped 2M pages but if kexec gets to allocate the second kernel somewhere else, those setup_data entries need to be mapped there too. - Fix objtool not to discard text references from the __tracepoints section so that ENDBR validation still works - Correct the setup_data types limit as it is user-visible, before 5.19 releases * tag 'x86_urgent_for_v5.19_rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/boot: Fix the setup data types max limit x86/ibt, objtool: Don't discard text references from tracepoint section x86/compressed/64: Add identity mappings for setup_data entries x86: Fix .brk attribute in linker script x86: Clear .brk area at early boot x86/xen: Use clear_bss() for Xen PV guests
| * x86/boot: Fix the setup data types max limitBorislav Petkov2022-07-101-1/+1
| | | | | | | | | | | | | | | | | | | | | | Commit in Fixes forgot to change the SETUP_TYPE_MAX definition which contains the highest valid setup data type. Correct that. Fixes: 5ea98e01ab52 ("x86/boot: Add Confidential Computing type to setup_data") Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lore.kernel.org/r/ddba81dd-cc92-699c-5274-785396a17fb5@zytor.com
| * x86/compressed/64: Add identity mappings for setup_data entriesMichael Roth2022-07-061-0/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The decompressed kernel initially relies on the identity map set up by the boot/compressed kernel for accessing things like boot_params. With the recent introduction of SEV-SNP support, the decompressed kernel also needs to access the setup_data entries pointed to by boot_params->hdr.setup_data. This can lead to a crash in the kexec kernel during early boot due to these entries not currently being included in the initial identity map, see thread at Link below. Include mappings for the setup_data entries in the initial identity map. [ bp: Massage commit message and use a helper var for better readability. ] Fixes: b190a043c49a ("x86/sev: Add SEV-SNP feature detection/setup") Reported-by: Jun'ichi Nomura <junichi.nomura@nec.com> Signed-off-by: Michael Roth <michael.roth@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lore.kernel.org/r/TYCPR01MB694815CD815E98945F63C99183B49@TYCPR01MB6948.jpnprd01.prod.outlook.com
| * x86: Fix .brk attribute in linker scriptJuergen Gross2022-07-011-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit in Fixes added the "NOLOAD" attribute to the .brk section as a "failsafe" measure. Unfortunately, this leads to the linker no longer covering the .brk section in a program header, resulting in the kernel loader not knowing that the memory for the .brk section must be reserved. This has led to crashes when loading the kernel as PV dom0 under Xen, but other scenarios could be hit by the same problem (e.g. in case an uncompressed kernel is used and the initrd is placed directly behind it). So drop the "NOLOAD" attribute. This has been verified to correctly cover the .brk section by a program header of the resulting ELF file. Fixes: e32683c6f7d2 ("x86/mm: Fix RESERVE_BRK() for older binutils") Signed-off-by: Juergen Gross <jgross@suse.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Josh Poimboeuf <jpoimboe@kernel.org> Link: https://lore.kernel.org/r/20220630071441.28576-4-jgross@suse.com
| * x86: Clear .brk area at early bootJuergen Gross2022-07-011-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The .brk section has the same properties as .bss: it is an alloc-only section and should be cleared before being used. Not doing so is especially a problem for Xen PV guests, as the hypervisor will validate page tables (check for writable page tables and hypervisor private bits) before accepting them to be used. Make sure .brk is initially zero by letting clear_bss() clear the brk area, too. Signed-off-by: Juergen Gross <jgross@suse.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lore.kernel.org/r/20220630071441.28576-3-jgross@suse.com
| * x86/xen: Use clear_bss() for Xen PV guestsJuergen Gross2022-07-014-12/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Instead of clearing the bss area in assembly code, use the clear_bss() function. This requires to pass the start_info address as parameter to xen_start_kernel() in order to avoid the xen_start_info being zeroed again. Signed-off-by: Juergen Gross <jgross@suse.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Link: https://lore.kernel.org/r/20220630071441.28576-2-jgross@suse.com
* | ACPI: CPPC: Don't require _OSC if X86_FEATURE_CPPC is supportedMario Limonciello2022-07-051-0/+10
|/ | | | | | | | | | | | | | | | | | | | | | commit 72f2ecb7ece7 ("ACPI: bus: Set CPPC _OSC bits for all and when CPPC_LIB is supported") added support for claiming to support CPPC in _OSC on non-Intel platforms. This unfortunately caused a regression on a vartiety of AMD platforms in the field because a number of AMD platforms don't set the `_OSC` bit 5 or 6 to indicate CPPC or CPPC v2 support. As these AMD platforms already claim CPPC support via a dedicated MSR from `X86_FEATURE_CPPC`, use this enable this feature rather than requiring the `_OSC` on platforms with a dedicated MSR. If there is additional breakage on the shared memory designs also missing this _OSC, additional follow up changes may be needed. Fixes: 72f2ecb7ece7 ("Set CPPC _OSC bits for all and when CPPC_LIB is supported") Reported-by: Perry Yuan <perry.yuan@amd.com> Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
* Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds2022-06-243-35/+50
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull kvm fixes from Paolo Bonzini: "ARM64: - Fix a regression with pKVM when kmemleak is enabled - Add Oliver Upton as an official KVM/arm64 reviewer selftests: - deal with compiler optimizations around hypervisor exits x86: - MAINTAINERS reorganization - Two SEV fixes" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: KVM: SEV: Init target VMCBs in sev_migrate_from KVM: x86/svm: add __GFP_ACCOUNT to __sev_dbg_{en,de}crypt_user() MAINTAINERS: Reorganize KVM/x86 maintainership selftests: KVM: Handle compiler optimizations in ucall KVM: arm64: Add Oliver as a reviewer KVM: arm64: Prevent kmemleak from accessing pKVM memory tools/kvm_stat: fix display of error when multiple processes are found
| * KVM: SEV: Init target VMCBs in sev_migrate_fromPeter Gonda2022-06-243-33/+48
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The target VMCBs during an intra-host migration need to correctly setup for running SEV and SEV-ES guests. Add sev_init_vmcb() function and make sev_es_init_vmcb() static. sev_init_vmcb() uses the now private function to init SEV-ES guests VMCBs when needed. Fixes: 0b020f5af092 ("KVM: SEV: Add support for SEV-ES intra host migration") Fixes: b56639318bb2 ("KVM: SEV: Add support for SEV intra host migration") Signed-off-by: Peter Gonda <pgonda@google.com> Cc: Marc Orr <marcorr@google.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Sean Christopherson <seanjc@google.com> Cc: Tom Lendacky <thomas.lendacky@amd.com> Cc: kvm@vger.kernel.org Cc: linux-kernel@vger.kernel.org Message-Id: <20220623173406.744645-1-pgonda@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
| * KVM: x86/svm: add __GFP_ACCOUNT to __sev_dbg_{en,de}crypt_user()Mingwei Zhang2022-06-241-2/+2
| | | | | | | | | | | | | | | | | | | | | | Adding the accounting flag when allocating pages within the SEV function, since these memory pages should belong to individual VM. No functional change intended. Signed-off-by: Mingwei Zhang <mizhang@google.com> Message-Id: <20220623171858.2083637-1-mizhang@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* | Merge tag 'net-5.19-rc4' of ↵Linus Torvalds2022-06-231-1/+2
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Paolo Abeni: "Including fixes from bpf and netfilter. Current release - regressions: - netfilter: cttimeout: fix slab-out-of-bounds read in cttimeout_net_exit Current release - new code bugs: - bpf: ftrace: keep address offset in ftrace_lookup_symbols - bpf: force cookies array to follow symbols sorting Previous releases - regressions: - ipv4: ping: fix bind address validity check - tipc: fix use-after-free read in tipc_named_reinit - eth: veth: add updating of trans_start Previous releases - always broken: - sock: redo the psock vs ULP protection check - netfilter: nf_dup_netdev: fix skb_under_panic - bpf: fix request_sock leak in sk lookup helpers - eth: igb: fix a use-after-free issue in igb_clean_tx_ring - eth: ice: prohibit improper channel config for DCB - eth: at803x: fix null pointer dereference on AR9331 phy - eth: virtio_net: fix xdp_rxq_info bug after suspend/resume Misc: - eth: hinic: replace memcpy() with direct assignment" * tag 'net-5.19-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (47 commits) net: openvswitch: fix parsing of nw_proto for IPv6 fragments sock: redo the psock vs ULP protection check Revert "net/tls: fix tls_sk_proto_close executed repeatedly" virtio_net: fix xdp_rxq_info bug after suspend/resume igb: Make DMA faster when CPU is active on the PCIe link net: dsa: qca8k: reduce mgmt ethernet timeout net: dsa: qca8k: reset cpu port on MTU change MAINTAINERS: Add a maintainer for OCP Time Card hinic: Replace memcpy() with direct assignment Revert "drivers/net/ethernet/neterion/vxge: Fix a use-after-free bug in vxge-main.c" net: phy: smsc: Disable Energy Detect Power-Down in interrupt mode ice: ethtool: Prohibit improper channel config for DCB ice: ethtool: advertise 1000M speeds properly ice: Fix switchdev rules book keeping ice: ignore protocol field in GTP offload netfilter: nf_dup_netdev: add and use recursion counter netfilter: nf_dup_netdev: do not push mac header a second time selftests: netfilter: correct PKTGEN_SCRIPT_PATHS in nft_concat_range.sh net/tls: fix tls_sk_proto_close executed repeatedly erspan: do not assume transport header is always set ...
| * \ Merge https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpfJakub Kicinski2022-06-171-1/+2
| |\ \ | | |/ | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Daniel Borkmann says: ==================== pull-request: bpf 2022-06-17 We've added 12 non-merge commits during the last 4 day(s) which contain a total of 14 files changed, 305 insertions(+), 107 deletions(-). The main changes are: 1) Fix x86 JIT tailcall count offset on BPF-2-BPF call, from Jakub Sitnicki. 2) Fix a kprobe_multi link bug which misplaces BPF cookies, from Jiri Olsa. 3) Fix an infinite loop when processing a module's BTF, from Kumar Kartikeya Dwivedi. 4) Fix getting a rethook only in RCU available context, from Masami Hiramatsu. 5) Fix request socket refcount leak in sk lookup helpers, from Jon Maxwell. 6) Fix xsk xmit behavior which wrongly adds skb to already full cq, from Ciara Loftus. * https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf: rethook: Reject getting a rethook if RCU is not watching fprobe, samples: Add use_trace option and show hit/missed counter bpf, docs: Update some of the JIT/maintenance entries selftest/bpf: Fix kprobe_multi bench test bpf: Force cookies array to follow symbols sorting ftrace: Keep address offset in ftrace_lookup_symbols selftests/bpf: Shuffle cookies symbols in kprobe multi test selftests/bpf: Test tail call counting with bpf2bpf and data on stack bpf, x86: Fix tail call count offset calculation on bpf2bpf call bpf: Limit maximum modifier chain length in btf_check_type_tags bpf: Fix request_sock leak in sk lookup helpers xsk: Fix generic transmit when completion queue reservation fails ==================== Link: https://lore.kernel.org/r/20220617202119.2421-1-daniel@iogearbox.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
| | * bpf, x86: Fix tail call count offset calculation on bpf2bpf callJakub Sitnicki2022-06-161-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | On x86-64 the tail call count is passed from one BPF function to another through %rax. Additionally, on function entry, the tail call count value is stored on stack right after the BPF program stack, due to register shortage. The stored count is later loaded from stack either when performing a tail call - to check if we have not reached the tail call limit - or before calling another BPF function call in order to pass it via %rax. In the latter case, we miscalculate the offset at which the tail call count was stored on function entry. The JIT does not take into account that the allocated BPF program stack is always a multiple of 8 on x86, while the actual stack depth does not have to be. This leads to a load from an offset that belongs to the BPF stack, as shown in the example below: SEC("tc") int entry(struct __sk_buff *skb) { /* Have data on stack which size is not a multiple of 8 */ volatile char arr[1] = {}; return subprog_tail(skb); } int entry(struct __sk_buff * skb): 0: (b4) w2 = 0 1: (73) *(u8 *)(r10 -1) = r2 2: (85) call pc+1#bpf_prog_ce2f79bb5f3e06dd_F 3: (95) exit int entry(struct __sk_buff * skb): 0xffffffffa0201788: nop DWORD PTR [rax+rax*1+0x0] 0xffffffffa020178d: xor eax,eax 0xffffffffa020178f: push rbp 0xffffffffa0201790: mov rbp,rsp 0xffffffffa0201793: sub rsp,0x8 0xffffffffa020179a: push rax 0xffffffffa020179b: xor esi,esi 0xffffffffa020179d: mov BYTE PTR [rbp-0x1],sil 0xffffffffa02017a1: mov rax,QWORD PTR [rbp-0x9] !!! tail call count 0xffffffffa02017a8: call 0xffffffffa02017d8 !!! is at rbp-0x10 0xffffffffa02017ad: leave 0xffffffffa02017ae: ret Fix it by rounding up the BPF stack depth to a multiple of 8, when calculating the tail call count offset on stack. Fixes: ebf7d1f508a7 ("bpf, x64: rework pro/epilogue and tailcall handling in JIT") Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20220616162037.535469-2-jakub@cloudflare.com
* | | Merge tag 'efi-urgent-for-v5.19-1' of ↵Linus Torvalds2022-06-211-1/+1
|\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi Pull EFI fixes from Ard Biesheuvel: - remove pointless include of asm/efi.h, which does not exist on ia64 - fix DXE service marshalling prototype for mixed mode * tag 'efi-urgent-for-v5.19-1' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi: efi/x86: libstub: Fix typo in __efi64_argmap* name efi: sysfb_efi: remove unnecessary <asm/efi.h> include
| * | | efi/x86: libstub: Fix typo in __efi64_argmap* nameEvgeniy Baskov2022-06-211-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The actual name of the DXE services function used is set_memory_space_attributes(), not set_memory_space_descriptor(). Change EFI mixed mode helper macro name to match the function name. Fixes: 31f1a0edff78 ("efi/x86: libstub: Make DXE calls mixed mode safe") Signed-off-by: Evgeniy Baskov <baskov@ispras.ru> Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
* | | | Merge tag 'x86-urgent-2022-06-19' of ↵Linus Torvalds2022-06-194-75/+159
|\ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Thomas Gleixner: - Make RESERVE_BRK() work again with older binutils. The recent 'simplification' broke that. - Make early #VE handling increment RIP when successful. - Make the #VE code consistent vs. the RIP adjustments and add comments. - Handle load_unaligned_zeropad() across page boundaries correctly in #VE when the second page is shared. * tag 'x86-urgent-2022-06-19' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/tdx: Handle load_unaligned_zeropad() page-cross to a shared page x86/tdx: Clarify RIP adjustments in #VE handler x86/tdx: Fix early #VE handling x86/mm: Fix RESERVE_BRK() for older binutils
| * | | | x86/tdx: Handle load_unaligned_zeropad() page-cross to a shared pageKirill A. Shutemov2022-06-171-1/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | load_unaligned_zeropad() can lead to unwanted loads across page boundaries. The unwanted loads are typically harmless. But, they might be made to totally unrelated or even unmapped memory. load_unaligned_zeropad() relies on exception fixup (#PF, #GP and now #VE) to recover from these unwanted loads. In TDX guests, the second page can be shared page and a VMM may configure it to trigger #VE. The kernel assumes that #VE on a shared page is an MMIO access and tries to decode instruction to handle it. In case of load_unaligned_zeropad() it may result in confusion as it is not MMIO access. Fix it by detecting split page MMIO accesses and failing them. load_unaligned_zeropad() will recover using exception fixups. The issue was discovered by analysis and reproduced artificially. It was not triggered during testing. [ dhansen: fix up changelogs and comments for grammar and clarity, plus incorporate Kirill's off-by-one fix] Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Link: https://lkml.kernel.org/r/20220614120135.14812-4-kirill.shutemov@linux.intel.com
| * | | | x86/tdx: Clarify RIP adjustments in #VE handlerKirill A. Shutemov2022-06-151-55/+123
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | After successful #VE handling, tdx_handle_virt_exception() has to move RIP to the next instruction. The handler needs to know the length of the instruction. If the #VE happened due to instruction execution, the GET_VEINFO TDX module call provides info on the instruction in R10, including its length. For #VE due to EPT violation, the info in R10 is not populand and the kernel must decode the instruction manually to find out its length. Restructure the code to make it explicit that the instruction length depends on the type of #VE. Make individual #VE handlers return the instruction length on success or -errno on failure. [ dhansen: fix up changelog and comments ] Suggested-by: Dave Hansen <dave.hansen@intel.com> Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Link: https://lkml.kernel.org/r/20220614120135.14812-3-kirill.shutemov@linux.intel.com
| * | | | x86/tdx: Fix early #VE handlingKirill A. Shutemov2022-06-151-1/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | tdx_early_handle_ve() does not increment RIP after successfully handling the exception. That leads to infinite loop of exceptions. Move RIP when exceptions are successfully handled. [ dhansen: make problem statement more clear ] Fixes: 32e72854fa5f ("x86/tdx: Port I/O: Add early boot support") Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com> Link: https://lkml.kernel.org/r/20220614120135.14812-2-kirill.shutemov@linux.intel.com
| * | | | x86/mm: Fix RESERVE_BRK() for older binutilsJosh Poimboeuf2022-06-133-24/+23
| | |_|/ | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | With binutils 2.26, RESERVE_BRK() causes a build failure: /tmp/ccnGOKZ5.s: Assembler messages: /tmp/ccnGOKZ5.s:98: Error: missing ')' /tmp/ccnGOKZ5.s:98: Error: missing ')' /tmp/ccnGOKZ5.s:98: Error: missing ')' /tmp/ccnGOKZ5.s:98: Error: junk at end of line, first unrecognized character is `U' The problem is this line: RESERVE_BRK(early_pgt_alloc, INIT_PGT_BUF_SIZE) Specifically, the INIT_PGT_BUF_SIZE macro which (via PAGE_SIZE's use _AC()) has a "1UL", which makes older versions of the assembler unhappy. Unfortunately the _AC() macro doesn't work for inline asm. Inline asm was only needed here to convince the toolchain to add the STT_NOBITS flag. However, if a C variable is placed in a section whose name is prefixed with ".bss", GCC and Clang automatically set STT_NOBITS. In fact, ".bss..page_aligned" already relies on this trick. So fix the build failure (and simplify the macro) by allocating the variable in C. Also, add NOLOAD to the ".brk" output section clause in the linker script. This is a failsafe in case the ".bss" prefix magic trick ever stops working somehow. If there's a section type mismatch, the GNU linker will force the ".brk" output section to be STT_NOBITS. The LLVM linker will fail with a "section type mismatch" error. Note this also changes the name of the variable from .brk.##name to __brk_##name. The variable names aren't actually used anywhere, so it's harmless. Fixes: a1e2c031ec39 ("x86/mm: Simplify RESERVE_BRK()") Reported-by: Joe Damato <jdamato@fastly.com> Reported-by: Byungchul Park <byungchul.park@lge.com> Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Tested-by: Joe Damato <jdamato@fastly.com> Link: https://lore.kernel.org/r/22d07a44c80d8e8e1e82b9a806ddc8c6bbb2606e.1654759036.git.jpoimboe@kernel.org
* | | | Merge tag 'objtool-urgent-2022-06-19' of ↵Linus Torvalds2022-06-192-7/+8
|\ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull build tooling updates from Thomas Gleixner: - Remove obsolete CONFIG_X86_SMAP reference from objtool - Fix overlapping text section failures in faddr2line for real - Remove OBJECT_FILES_NON_STANDARD usage from x86 ftrace and replace it with finegrained annotations so objtool can validate that code correctly. * tag 'objtool-urgent-2022-06-19' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/ftrace: Remove OBJECT_FILES_NON_STANDARD usage faddr2line: Fix overlapping text section failures, the sequel objtool: Fix obsolete reference to CONFIG_X86_SMAP
| * | | | x86/ftrace: Remove OBJECT_FILES_NON_STANDARD usageJosh Poimboeuf2022-06-062-7/+8
| | |/ / | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The file-wide OBJECT_FILES_NON_STANDARD annotation is used with CONFIG_FRAME_POINTER to tell objtool to skip the entire file when frame pointers are enabled. However that annotation is now deprecated because it doesn't work with IBT, where objtool runs on vmlinux.o instead of individual translation units. Instead, use more fine-grained function-specific annotations: - The 'save_mcount_regs' macro does funny things with the frame pointer. Use STACK_FRAME_NON_STANDARD_FP to tell objtool to ignore the functions using it. - The return_to_handler() "function" isn't actually a callable function. Instead of being called, it's returned to. The real return address isn't on the stack, so unwinding is already doomed no matter which unwinder is used. So just remove the STT_FUNC annotation, telling objtool to ignore it. That also removes the implicit ANNOTATE_NOENDBR, which now needs to be made explicit. Fixes the following warning: vmlinux.o: warning: objtool: __fentry__+0x16: return with modified stack frame Fixes: ed53a0d97192 ("x86/alternative: Use .ibt_endbr_seal to seal indirect calls") Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org> Link: https://lore.kernel.org/r/b7a7a42fe306aca37826043dac89e113a1acdbac.1654268610.git.jpoimboe@kernel.org
* | | | Merge tag 'pci-v5.19-fixes-2' of ↵Linus Torvalds2022-06-174-17/+18
|\ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci Pull pci fix from Bjorn Helgaas: "Revert clipping of PCI host bridge windows to avoid E820 regions, which broke several machines by forcing unnecessary BAR reassignments (Hans de Goede)" * tag 'pci-v5.19-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: x86/PCI: Revert "x86/PCI: Clip only host bridge windows for E820 regions"
| * | | | x86/PCI: Revert "x86/PCI: Clip only host bridge windows for E820 regions"Hans de Goede2022-06-174-17/+18
| |/ / / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This reverts commit 4c5e242d3e93. Prior to 4c5e242d3e93 ("x86/PCI: Clip only host bridge windows for E820 regions"), E820 regions did not affect PCI host bridge windows. We only looked at E820 regions and avoided them when allocating new MMIO space. If firmware PCI bridge window and BAR assignments used E820 regions, we left them alone. After 4c5e242d3e93, we removed E820 regions from the PCI host bridge windows before looking at BARs, so firmware assignments in E820 regions looked like errors, and we moved things around to fit in the space left (if any) after removing the E820 regions. This unnecessary BAR reassignment broke several machines. Guilherme reported that Steam Deck fails to boot after 4c5e242d3e93. We clipped the window that contained most 32-bit BARs: BIOS-e820: [mem 0x00000000a0000000-0x00000000a00fffff] reserved acpi PNP0A08:00: clipped [mem 0x80000000-0xf7ffffff window] to [mem 0xa0100000-0xf7ffffff window] for e820 entry [mem 0xa0000000-0xa00fffff] which forced us to reassign all those BARs, for example, this NVMe BAR: pci 0000:00:01.2: PCI bridge to [bus 01] pci 0000:00:01.2: bridge window [mem 0x80600000-0x806fffff] pci 0000:01:00.0: BAR 0: [mem 0x80600000-0x80603fff 64bit] pci 0000:00:01.2: can't claim window [mem 0x80600000-0x806fffff]: no compatible bridge window pci 0000:01:00.0: can't claim BAR 0 [mem 0x80600000-0x80603fff 64bit]: no compatible bridge window pci 0000:00:01.2: bridge window: assigned [mem 0xa0100000-0xa01fffff] pci 0000:01:00.0: BAR 0: assigned [mem 0xa0100000-0xa0103fff 64bit] All the reassignments were successful, so the devices should have been functional at the new addresses, but some were not. Andy reported a similar failure on an Intel MID platform. Benjamin reported a similar failure on a VMWare Fusion VM. Note: this is not a clean revert; this revert keeps the later change to make the clipping dependent on a new pci_use_e820 bool, moving the checking of this bool to arch_remove_reservations(). [bhelgaas: commit log, add more reporters and testers] BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=216109 Reported-by: Guilherme G. Piccoli <gpiccoli@igalia.com> Reported-by: Andy Shevchenko <andy.shevchenko@gmail.com> Reported-by: Benjamin Coddington <bcodding@redhat.com> Reported-by: Jongman Heo <jongman.heo@gmail.com> Fixes: 4c5e242d3e93 ("x86/PCI: Clip only host bridge windows for E820 regions") Link: https://lore.kernel.org/r/20220612144325.85366-1-hdegoede@redhat.com Tested-by: Guilherme G. Piccoli <gpiccoli@igalia.com> Tested-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Tested-by: Benjamin Coddington <bcodding@redhat.com> Signed-off-by: Hans de Goede <hdegoede@redhat.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
* | | | Merge tag 'hyperv-fixes-signed-20220617' of ↵Linus Torvalds2022-06-173-6/+88
|\ \ \ \ | |_|_|/ |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux Pull hyperv fixes from Wei Liu: - Fix hv_init_clocksource annotation (Masahiro Yamada) - Two bug fixes for vmbus driver (Saurabh Sengar) - Fix SEV negotiation (Tianyu Lan) - Fix comments in code (Xiang Wang) - One minor fix to HID driver (Michael Kelley) * tag 'hyperv-fixes-signed-20220617' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux: x86/Hyper-V: Add SEV negotiate protocol support in Isolation VM Drivers: hv: vmbus: Release cpu lock in error case HID: hyperv: Correctly access fields declared as __le16 clocksource: hyper-v: unexport __init-annotated hv_init_clocksource() Drivers: hv: Fix syntax errors in comments Drivers: hv: vmbus: Don't assign VMbus channel interrupts to isolated CPUs
| * | | x86/Hyper-V: Add SEV negotiate protocol support in Isolation VMTianyu Lan2022-06-153-6/+88
| |/ / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Hyper-V Isolation VM current code uses sev_es_ghcb_hv_call() to read/write MSR via GHCB page and depends on the sev code. This may cause regression when sev code changes interface design. The latest SEV-ES code requires to negotiate GHCB version before reading/writing MSR via GHCB page and sev_es_ghcb_hv_call() doesn't work for Hyper-V Isolation VM. Add Hyper-V ghcb related implementation to decouple SEV and Hyper-V code. Negotiate GHCB version in the hyperv_init() and use the version to communicate with Hyper-V in the ghcb hv call function. Fixes: 2ea29c5abbc2 ("x86/sev: Save the negotiated GHCB version") Signed-off-by: Tianyu Lan <Tianyu.Lan@microsoft.com> Reviewed-by: Michael Kelley <mikelley@microsoft.com> Link: https://lore.kernel.org/r/20220614014553.1915929-1-ltykernel@gmail.com Signed-off-by: Wei Liu <wei.liu@kernel.org>
* | | Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds2022-06-149-127/+197
|\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull kvm fixes from Paolo Bonzini: "While last week's pull request contained miscellaneous fixes for x86, this one covers other architectures, selftests changes, and a bigger series for APIC virtualization bugs that were discovered during 5.20 development. The idea is to base 5.20 development for KVM on top of this tag. ARM64: - Properly reset the SVE/SME flags on vcpu load - Fix a vgic-v2 regression regarding accessing the pending state of a HW interrupt from userspace (and make the code common with vgic-v3) - Fix access to the idreg range for protected guests - Ignore 'kvm-arm.mode=protected' when using VHE - Return an error from kvm_arch_init_vm() on allocation failure - A bunch of small cleanups (comments, annotations, indentation) RISC-V: - Typo fix in arch/riscv/kvm/vmid.c - Remove broken reference pattern from MAINTAINERS entry x86-64: - Fix error in page tables with MKTME enabled - Dirty page tracking performance test extended to running a nested guest - Disable APICv/AVIC in cases that it cannot implement correctly" [ This merge also fixes a misplaced end parenthesis bug introduced in commit 3743c2f02517 ("KVM: x86: inhibit APICv/AVIC on changes to APIC ID or APIC base") pointed out by Sean Christopherson ] Link: https://lore.kernel.org/all/20220610191813.371682-1-seanjc@google.com/ * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (34 commits) KVM: selftests: Restrict test region to 48-bit physical addresses when using nested KVM: selftests: Add option to run dirty_log_perf_test vCPUs in L2 KVM: selftests: Clean up LIBKVM files in Makefile KVM: selftests: Link selftests directly with lib object files KVM: selftests: Drop unnecessary rule for STATIC_LIBS KVM: selftests: Add a helper to check EPT/VPID capabilities KVM: selftests: Move VMX_EPT_VPID_CAP_AD_BITS to vmx.h KVM: selftests: Refactor nested_map() to specify target level KVM: selftests: Drop stale function parameter comment for nested_map() KVM: selftests: Add option to create 2M and 1G EPT mappings KVM: selftests: Replace x86_page_size with PG_LEVEL_XX KVM: x86: SVM: fix nested PAUSE filtering when L0 intercepts PAUSE KVM: x86: SVM: drop preempt-safe wrappers for avic_vcpu_load/put KVM: x86: disable preemption around the call to kvm_arch_vcpu_{un|}blocking KVM: x86: disable preemption while updating apicv inhibition KVM: x86: SVM: fix avic_kick_target_vcpus_fast KVM: x86: SVM: remove avic's broken code that updated APIC ID KVM: x86: inhibit APICv/AVIC on changes to APIC ID or APIC base KVM: x86: document AVIC/APICv inhibit reasons KVM: x86/mmu: Set memory encryption "value", not "mask", in shadow PDPTRs ...
| * | | KVM: x86: SVM: fix nested PAUSE filtering when L0 intercepts PAUSEPaolo Bonzini2022-06-092-20/+23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit 74fd41ed16fd ("KVM: x86: nSVM: support PAUSE filtering when L0 doesn't intercept PAUSE") introduced passthrough support for nested pause filtering, (when the host doesn't intercept PAUSE) (either disabled with kvm module param, or disabled with '-overcommit cpu-pm=on') Before this commit, L1 KVM didn't intercept PAUSE at all; afterwards, the feature was exposed as supported by KVM cpuid unconditionally, thus if L1 could try to use it even when the L0 KVM can't really support it. In this case the fallback caused KVM to intercept each PAUSE instruction; in some cases, such intercept can slow down the nested guest so much that it can fail to boot. Instead, before the problematic commit KVM was already setting both thresholds to 0 in vmcb02, but after the first userspace VM exit shrink_ple_window was called and would reset the pause_filter_count to the default value. To fix this, change the fallback strategy - ignore the guest threshold values, but use/update the host threshold values unless the guest specifically requests disabling PAUSE filtering (either simple or advanced). Also fix a minor bug: on nested VM exit, when PAUSE filter counter were copied back to vmcb01, a dirty bit was not set. Thanks a lot to Suravee Suthikulpanit for debugging this! Fixes: 74fd41ed16fd ("KVM: x86: nSVM: support PAUSE filtering when L0 doesn't intercept PAUSE") Reported-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Tested-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Co-developed-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20220518072709.730031-1-mlevitsk@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
| * | | KVM: x86: SVM: drop preempt-safe wrappers for avic_vcpu_load/putMaxim Levitsky2022-06-093-27/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Now that these functions are always called with preemption disabled, remove the preempt_disable()/preempt_enable() pair inside them. No functional change intended. Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20220606180829.102503-8-mlevitsk@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
| * | | KVM: x86: disable preemption while updating apicv inhibitionMaxim Levitsky2022-06-091-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently nothing prevents preemption in kvm_vcpu_update_apicv. On SVM, If the preemption happens after we update the vcpu->arch.apicv_active, the preemption itself will 'update' the inhibition since the AVIC will be first disabled on vCPU unload and then enabled, when the current task is loaded again. Then we will try to update it again, which will lead to a warning in __avic_vcpu_load, that the AVIC is already enabled. Fix this by disabling preemption in this code. Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20220606180829.102503-6-mlevitsk@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
| * | | KVM: x86: SVM: fix avic_kick_target_vcpus_fastMaxim Levitsky2022-06-091-36/+69
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There are two issues in avic_kick_target_vcpus_fast 1. It is legal to issue an IPI request with APIC_DEST_NOSHORT and a physical destination of 0xFF (or 0xFFFFFFFF in case of x2apic), which must be treated as a broadcast destination. Fix this by explicitly checking for it. Also don’t use ‘index’ in this case as it gives no new information. 2. It is legal to issue a logical IPI request to more than one target. Index field only provides index in physical id table of first such target and therefore can't be used before we are sure that only a single target was addressed. Instead, parse the ICRL/ICRH, double check that a unicast interrupt was requested, and use that info to figure out the physical id of the target vCPU. At that point there is no need to use the index field as well. In addition to fixing the above issues, also skip the call to kvm_apic_match_dest. It is possible to do this now, because now as long as AVIC is not inhibited, it is guaranteed that none of the vCPUs changed their apic id from its default value. This fixes boot of windows guest with AVIC enabled because it uses IPI with 0xFF destination and no destination shorthand. Fixes: 7223fd2d5338 ("KVM: SVM: Use target APIC ID to complete AVIC IRQs when possible") Cc: stable@vger.kernel.org Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20220606180829.102503-5-mlevitsk@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
| * | | KVM: x86: SVM: remove avic's broken code that updated APIC IDMaxim Levitsky2022-06-091-35/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | AVIC is now inhibited if the guest changes the apic id, and therefore this code is no longer needed. There are several ways this code was broken, including: 1. a vCPU was only allowed to change its apic id to an apic id of an existing vCPU. 2. After such change, the vCPU whose apic id entry was overwritten, could not correctly change its own apic id, because its own entry is already overwritten. Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20220606180829.102503-4-mlevitsk@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
| * | | KVM: x86: inhibit APICv/AVIC on changes to APIC ID or APIC baseMaxim Levitsky2022-06-094-6/+37
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Neither of these settings should be changed by the guest and it is a burden to support it in the acceleration code, so just inhibit this code instead. Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20220606180829.102503-3-mlevitsk@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
| * | | KVM: x86: document AVIC/APICv inhibit reasonsMaxim Levitsky2022-06-091-2/+57
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | These days there are too many AVIC/APICv inhibit reasons, and it doesn't hurt to have some documentation for them. Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20220606180829.102503-2-mlevitsk@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
| * | | KVM: x86/mmu: Set memory encryption "value", not "mask", in shadow PDPTRsYuan Yao2022-06-091-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Assign shadow_me_value, not shadow_me_mask, to PAE root entries, a.k.a. shadow PDPTRs, when host memory encryption is supported. The "mask" is the set of all possible memory encryption bits, e.g. MKTME KeyIDs, whereas "value" holds the actual value that needs to be stuffed into host page tables. Using shadow_me_mask results in a failed VM-Entry due to setting reserved PA bits in the PDPTRs, and ultimately causes an OOPS due to physical addresses with non-zero MKTME bits sending to_shadow_page() into the weeds: set kvm_intel.dump_invalid_vmcs=1 to dump internal KVM state. BUG: unable to handle page fault for address: ffd43f00063049e8 PGD 86dfd8067 P4D 0 Oops: 0000 [#1] PREEMPT SMP RIP: 0010:mmu_free_root_page+0x3c/0x90 [kvm] kvm_mmu_free_roots+0xd1/0x200 [kvm] __kvm_mmu_unload+0x29/0x70 [kvm] kvm_mmu_unload+0x13/0x20 [kvm] kvm_arch_destroy_vm+0x8a/0x190 [kvm] kvm_put_kvm+0x197/0x2d0 [kvm] kvm_vm_release+0x21/0x30 [kvm] __fput+0x8e/0x260 ____fput+0xe/0x10 task_work_run+0x6f/0xb0 do_exit+0x327/0xa90 do_group_exit+0x35/0xa0 get_signal+0x911/0x930 arch_do_signal_or_restart+0x37/0x720 exit_to_user_mode_prepare+0xb2/0x140 syscall_exit_to_user_mode+0x16/0x30 do_syscall_64+0x4e/0x90 entry_SYSCALL_64_after_hwframe+0x44/0xae Fixes: e54f1ff244ac ("KVM: x86/mmu: Add shadow_me_value and repurpose shadow_me_mask") Signed-off-by: Yuan Yao <yuan.yao@intel.com> Reviewed-by: Kai Huang <kai.huang@intel.com> Message-Id: <20220608012015.19566-1-yuan.yao@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
| * | | Merge tag 'kvm-riscv-fixes-5.19-1' of https://github.com/kvm-riscv/linux ↵Paolo Bonzini2022-06-09263-3668/+8570
| |\| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | into HEAD KVM/riscv fixes for 5.19, take #1 - Typo fix in arch/riscv/kvm/vmid.c - Remove broken reference pattern from MAINTAINERS entry
* | | | Merge tag 'x86-bugs-2022-06-01' of ↵Linus Torvalds2022-06-148-39/+353
|\ \ \ \ | |_|_|/ |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 MMIO stale data fixes from Thomas Gleixner: "Yet another hw vulnerability with a software mitigation: Processor MMIO Stale Data. They are a class of MMIO-related weaknesses which can expose stale data by propagating it into core fill buffers. Data which can then be leaked using the usual speculative execution methods. Mitigations include this set along with microcode updates and are similar to MDS and TAA vulnerabilities: VERW now clears those buffers too" * tag 'x86-bugs-2022-06-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/speculation/mmio: Print SMT warning KVM: x86/speculation: Disable Fill buffer clear within guests x86/speculation/mmio: Reuse SRBDS mitigation for SBDS x86/speculation/srbds: Update SRBDS mitigation selection x86/speculation/mmio: Add sysfs reporting for Processor MMIO Stale Data x86/speculation/mmio: Enable CPU Fill buffer clearing on idle x86/bugs: Group MDS, TAA & Processor MMIO Stale Data mitigations x86/speculation/mmio: Add mitigation for Processor MMIO Stale Data x86/speculation: Add a common function for MD_CLEAR mitigation update x86/speculation/mmio: Enumerate Processor MMIO Stale Data bug Documentation: Add documentation for Processor MMIO Stale Data
| * | | x86/speculation/mmio: Print SMT warningJosh Poimboeuf2022-06-011-0/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Similar to MDS and TAA, print a warning if SMT is enabled for the MMIO Stale Data vulnerability. Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
| * | | KVM: x86/speculation: Disable Fill buffer clear within guestsPawan Gupta2022-05-214-0/+80
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The enumeration of MD_CLEAR in CPUID(EAX=7,ECX=0).EDX{bit 10} is not an accurate indicator on all CPUs of whether the VERW instruction will overwrite fill buffers. FB_CLEAR enumeration in IA32_ARCH_CAPABILITIES{bit 17} covers the case of CPUs that are not vulnerable to MDS/TAA, indicating that microcode does overwrite fill buffers. Guests running in VMM environments may not be aware of all the capabilities/vulnerabilities of the host CPU. Specifically, a guest may apply MDS/TAA mitigations when a virtual CPU is enumerated as vulnerable to MDS/TAA even when the physical CPU is not. On CPUs that enumerate FB_CLEAR_CTRL the VMM may set FB_CLEAR_DIS to skip overwriting of fill buffers by the VERW instruction. This is done by setting FB_CLEAR_DIS during VMENTER and resetting on VMEXIT. For guests that enumerate FB_CLEAR (explicitly asking for fill buffer clear capability) the VMM will not use FB_CLEAR_DIS. Irrespective of guest state, host overwrites CPU buffers before VMENTER to protect itself from an MMIO capable guest, as part of mitigation for MMIO Stale Data vulnerabilities. Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com> Signed-off-by: Borislav Petkov <bp@suse.de>
| * | | x86/speculation/mmio: Reuse SRBDS mitigation for SBDSPawan Gupta2022-05-211-7/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The Shared Buffers Data Sampling (SBDS) variant of Processor MMIO Stale Data vulnerabilities may expose RDRAND, RDSEED and SGX EGETKEY data. Mitigation for this is added by a microcode update. As some of the implications of SBDS are similar to SRBDS, SRBDS mitigation infrastructure can be leveraged by SBDS. Set X86_BUG_SRBDS and use SRBDS mitigation. Mitigation is enabled by default; use srbds=off to opt-out. Mitigation status can be checked from below file: /sys/devices/system/cpu/vulnerabilities/srbds Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com> Signed-off-by: Borislav Petkov <bp@suse.de>
| * | | x86/speculation/srbds: Update SRBDS mitigation selectionPawan Gupta2022-05-211-3/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently, Linux disables SRBDS mitigation on CPUs not affected by MDS and have the TSX feature disabled. On such CPUs, secrets cannot be extracted from CPU fill buffers using MDS or TAA. Without SRBDS mitigation, Processor MMIO Stale Data vulnerabilities can be used to extract RDRAND, RDSEED, and EGETKEY data. Do not disable SRBDS mitigation by default when CPU is also affected by Processor MMIO Stale Data vulnerabilities. Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com> Signed-off-by: Borislav Petkov <bp@suse.de>
| * | | x86/speculation/mmio: Add sysfs reporting for Processor MMIO Stale DataPawan Gupta2022-05-211-0/+22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add the sysfs reporting file for Processor MMIO Stale Data vulnerability. It exposes the vulnerability and mitigation state similar to the existing files for the other hardware vulnerabilities. Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com> Signed-off-by: Borislav Petkov <bp@suse.de>
| * | | x86/speculation/mmio: Enable CPU Fill buffer clearing on idlePawan Gupta2022-05-211-2/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When the CPU is affected by Processor MMIO Stale Data vulnerabilities, Fill Buffer Stale Data Propagator (FBSDP) can propagate stale data out of Fill buffer to uncore buffer when CPU goes idle. Stale data can then be exploited with other variants using MMIO operations. Mitigate it by clearing the Fill buffer before entering idle state. Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com> Co-developed-by: Josh Poimboeuf <jpoimboe@redhat.com> Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Signed-off-by: Borislav Petkov <bp@suse.de>
| * | | x86/bugs: Group MDS, TAA & Processor MMIO Stale Data mitigationsPawan Gupta2022-05-211-10/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | MDS, TAA and Processor MMIO Stale Data mitigations rely on clearing CPU buffers. Moreover, status of these mitigations affects each other. During boot, it is important to maintain the order in which these mitigations are selected. This is especially true for md_clear_update_mitigation() that needs to be called after MDS, TAA and Processor MMIO Stale Data mitigation selection is done. Introduce md_clear_select_mitigation(), and select all these mitigations from there. This reflects relationships between these mitigations and ensures proper ordering. Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com> Signed-off-by: Borislav Petkov <bp@suse.de>
| * | | x86/speculation/mmio: Add mitigation for Processor MMIO Stale DataPawan Gupta2022-05-213-4/+112
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Processor MMIO Stale Data is a class of vulnerabilities that may expose data after an MMIO operation. For details please refer to Documentation/admin-guide/hw-vuln/processor_mmio_stale_data.rst. These vulnerabilities are broadly categorized as: Device Register Partial Write (DRPW): Some endpoint MMIO registers incorrectly handle writes that are smaller than the register size. Instead of aborting the write or only copying the correct subset of bytes (for example, 2 bytes for a 2-byte write), more bytes than specified by the write transaction may be written to the register. On some processors, this may expose stale data from the fill buffers of the core that created the write transaction. Shared Buffers Data Sampling (SBDS): After propagators may have moved data around the uncore and copied stale data into client core fill buffers, processors affected by MFBDS can leak data from the fill buffer. Shared Buffers Data Read (SBDR): It is similar to Shared Buffer Data Sampling (SBDS) except that the data is directly read into the architectural software-visible state. An attacker can use these vulnerabilities to extract data from CPU fill buffers using MDS and TAA methods. Mitigate it by clearing the CPU fill buffers using the VERW instruction before returning to a user or a guest. On CPUs not affected by MDS and TAA, user application cannot sample data from CPU fill buffers using MDS or TAA. A guest with MMIO access can still use DRPW or SBDR to extract data architecturally. Mitigate it with VERW instruction to clear fill buffers before VMENTER for MMIO capable guests. Add a kernel parameter mmio_stale_data={off|full|full,nosmt} to control the mitigation. Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com> Signed-off-by: Borislav Petkov <bp@suse.de>
| * | | x86/speculation: Add a common function for MD_CLEAR mitigation updatePawan Gupta2022-05-211-26/+33
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Processor MMIO Stale Data mitigation uses similar mitigation as MDS and TAA. In preparation for adding its mitigation, add a common function to update all mitigations that depend on MD_CLEAR. [ bp: Add a newline in md_clear_update_mitigation() to separate statements better. ] Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com> Signed-off-by: Borislav Petkov <bp@suse.de>
| * | | x86/speculation/mmio: Enumerate Processor MMIO Stale Data bugPawan Gupta2022-05-213-2/+61
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Processor MMIO Stale Data is a class of vulnerabilities that may expose data after an MMIO operation. For more details please refer to Documentation/admin-guide/hw-vuln/processor_mmio_stale_data.rst Add the Processor MMIO Stale Data bug enumeration. A microcode update adds new bits to the MSR IA32_ARCH_CAPABILITIES, define them. Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com> Signed-off-by: Borislav Petkov <bp@suse.de>
* | | | Merge tag 'for-linus-5.19a-rc2-tag' of ↵Linus Torvalds2022-06-105-8/+8
|\ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip Pull xen updates from Juergen Gross: - a small cleanup removing "export" of an __init function - a small series adding a new infrastructure for platform flags - a series adding generic virtio support for Xen guests (frontend side) * tag 'for-linus-5.19a-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip: xen: unexport __init-annotated xen_xlate_map_ballooned_pages() arm/xen: Assign xen-grant DMA ops for xen-grant DMA devices xen/grant-dma-ops: Retrieve the ID of backend's domain for DT devices xen/grant-dma-iommu: Introduce stub IOMMU driver dt-bindings: Add xen,grant-dma IOMMU description for xen-grant DMA ops xen/virtio: Enable restricted memory access using Xen grant mappings xen/grant-dma-ops: Add option to restrict memory access under Xen xen/grants: support allocating consecutive grants arm/xen: Introduce xen_setup_dma_ops() virtio: replace arch_has_restricted_virtio_memory_access() kernel: add platform_has() infrastructure
| * | | | xen/virtio: Enable restricted memory access using Xen grant mappingsJuergen Gross2022-06-062-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In order to support virtio in Xen guests add a config option XEN_VIRTIO enabling the user to specify whether in all Xen guests virtio should be able to access memory via Xen grant mappings only on the host side. Also set PLATFORM_VIRTIO_RESTRICTED_MEM_ACCESS feature from the guest initialization code on Arm and x86 if CONFIG_XEN_VIRTIO is enabled. Signed-off-by: Juergen Gross <jgross@suse.com> Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com> Reviewed-by: Stefano Stabellini <sstabellini@kernel.org> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Link: https://lore.kernel.org/r/1654197833-25362-5-git-send-email-olekstysh@gmail.com Signed-off-by: Juergen Gross <jgross@suse.com>
| * | | | virtio: replace arch_has_restricted_virtio_memory_access()Juergen Gross2022-06-063-8/+4
| | |_|/ | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Instead of using arch_has_restricted_virtio_memory_access() together with CONFIG_ARCH_HAS_RESTRICTED_VIRTIO_MEMORY_ACCESS, replace those with platform_has() and a new platform feature PLATFORM_VIRTIO_RESTRICTED_MEM_ACCESS. Signed-off-by: Juergen Gross <jgross@suse.com> Reviewed-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com> Tested-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com> # Arm64 only Reviewed-by: Christoph Hellwig <hch@lst.de> Acked-by: Borislav Petkov <bp@suse.de>