summaryrefslogtreecommitdiffstats
path: root/arch/x86
Commit message (Collapse)AuthorAgeFilesLines
* Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds2018-12-211-1/+3
|\ | | | | | | | | | | | | | | | | Pull kvm fix from Paolo Bonzini: "A simple patch for a pretty bad bug: Unbreak AMD nested virtualization." * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: KVM: x86: nSVM: fix switch to guest mmu
| * KVM: x86: nSVM: fix switch to guest mmuVitaly Kuznetsov2018-12-191-1/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Recent optimizations in MMU code broke nested SVM with NPT in L1 completely: when we do nested_svm_{,un}init_mmu_context() we want to switch from TDP MMU to shadow MMU, both init_kvm_tdp_mmu() and kvm_init_shadow_mmu() check if re-configuration is needed by looking at cache source data. The data, however, doesn't change - it's only the type of the MMU which changes. We end up not re-initializing guest MMU as shadow and everything goes off the rails. The issue could have been fixed by putting MMU type into extended MMU role but this is not really needed. We can just split root and guest MMUs the exact same way we did for nVMX, their types never change in the lifetime of a vCPU. There is still room for improvement: currently, we reset all MMU roots when switching from L1 to L2 and back and this is not needed. Fixes: 7dcd57552008 ("x86/kvm/mmu: check if tdp/shadow MMU reconfiguration is needed") Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* | Merge branch 'x86-urgent-for-linus' of ↵Linus Torvalds2018-12-2122-333/+365
|\ \ | |/ |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Ingo Molnar: "The biggest part is a series of reverts for the macro based GCC inlining workarounds. It caused regressions in distro build and other kernel tooling environments, and the GCC project was very receptive to fixing the underlying inliner weaknesses - so as time ran out we decided to do a reasonably straightforward revert of the patches. The plan is to rely on the 'asm inline' GCC 9 feature, which might be backported to GCC 8 and could thus become reasonably widely available on modern distros. Other than those reverts, there's misc fixes from all around the place. I wish our final x86 pull request for v4.20 was smaller..." * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: Revert "kbuild/Makefile: Prepare for using macros in inline assembly code to work around asm() related GCC inlining bugs" Revert "x86/objtool: Use asm macros to work around GCC inlining bugs" Revert "x86/refcount: Work around GCC inlining bug" Revert "x86/alternatives: Macrofy lock prefixes to work around GCC inlining bugs" Revert "x86/bug: Macrofy the BUG table section handling, to work around GCC inlining bugs" Revert "x86/paravirt: Work around GCC inlining bugs when compiling paravirt ops" Revert "x86/extable: Macrofy inline assembly code to work around GCC inlining bugs" Revert "x86/cpufeature: Macrofy inline assembly code to work around GCC inlining bugs" Revert "x86/jump-labels: Macrofy inline assembly code to work around GCC inlining bugs" x86/mtrr: Don't copy uninitialized gentry fields back to userspace x86/fsgsbase/64: Fix the base write helper functions x86/mm/cpa: Fix cpa_flush_array() TLB invalidation x86/vdso: Pass --eh-frame-hdr to the linker x86/mm: Fix decoy address handling vs 32-bit builds x86/intel_rdt: Ensure a CPU remains online for the region's pseudo-locking sequence x86/dump_pagetables: Fix LDT remap address marker x86/mm: Fix guard hole handling
| * Revert "kbuild/Makefile: Prepare for using macros in inline assembly code to ↵Ingo Molnar2018-12-192-14/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | work around asm() related GCC inlining bugs" This reverts commit 77b0bf55bc675233d22cd5df97605d516d64525e. See this commit for details about the revert: e769742d3584 ("Revert "x86/jump-labels: Macrofy inline assembly code to work around GCC inlining bugs"") Conflicts: arch/x86/Makefile Reported-by: Masahiro Yamada <yamada.masahiro@socionext.com> Reviewed-by: Borislav Petkov <bp@alien8.de> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Cc: Juergen Gross <jgross@suse.com> Cc: Richard Biener <rguenther@suse.de> Cc: Kees Cook <keescook@chromium.org> Cc: Segher Boessenkool <segher@kernel.crashing.org> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Nadav Amit <namit@vmware.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
| * Revert "x86/objtool: Use asm macros to work around GCC inlining bugs"Ingo Molnar2018-12-191-2/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This reverts commit c06c4d8090513f2974dfdbed2ac98634357ac475. See this commit for details about the revert: e769742d3584 ("Revert "x86/jump-labels: Macrofy inline assembly code to work around GCC inlining bugs"") Reported-by: Masahiro Yamada <yamada.masahiro@socionext.com> Reviewed-by: Borislav Petkov <bp@alien8.de> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Cc: Juergen Gross <jgross@suse.com> Cc: Richard Biener <rguenther@suse.de> Cc: Kees Cook <keescook@chromium.org> Cc: Segher Boessenkool <segher@kernel.crashing.org> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Nadav Amit <namit@vmware.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
| * Revert "x86/refcount: Work around GCC inlining bug"Ingo Molnar2018-12-192-49/+33
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This reverts commit 9e1725b410594911cc5981b6c7b4cea4ec054ca8. See this commit for details about the revert: e769742d3584 ("Revert "x86/jump-labels: Macrofy inline assembly code to work around GCC inlining bugs"") The conflict resolution for interaction with: 288e4521f0f6: ("x86/asm: 'Simplify' GEN_*_RMWcc() macros") was provided by Masahiro Yamada. Conflicts: arch/x86/include/asm/refcount.h Reported-by: Masahiro Yamada <yamada.masahiro@socionext.com> Reviewed-by: Borislav Petkov <bp@alien8.de> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Cc: Juergen Gross <jgross@suse.com> Cc: Richard Biener <rguenther@suse.de> Cc: Kees Cook <keescook@chromium.org> Cc: Segher Boessenkool <segher@kernel.crashing.org> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Nadav Amit <namit@vmware.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
| * Revert "x86/alternatives: Macrofy lock prefixes to work around GCC inlining ↵Ingo Molnar2018-12-193-17/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | bugs" This reverts commit 77f48ec28e4ccff94d2e5f4260a83ac27a7f3099. See this commit for details about the revert: e769742d3584 ("Revert "x86/jump-labels: Macrofy inline assembly code to work around GCC inlining bugs"") Reported-by: Masahiro Yamada <yamada.masahiro@socionext.com> Reviewed-by: Borislav Petkov <bp@alien8.de> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Cc: Juergen Gross <jgross@suse.com> Cc: Richard Biener <rguenther@suse.de> Cc: Kees Cook <keescook@chromium.org> Cc: Segher Boessenkool <segher@kernel.crashing.org> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Nadav Amit <namit@vmware.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
| * Revert "x86/bug: Macrofy the BUG table section handling, to work around GCC ↵Ingo Molnar2018-12-192-57/+42
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | inlining bugs" This reverts commit f81f8ad56fd1c7b99b2ed1c314527f7d9ac447c6. See this commit for details about the revert: e769742d3584 ("Revert "x86/jump-labels: Macrofy inline assembly code to work around GCC inlining bugs"") Reported-by: Masahiro Yamada <yamada.masahiro@socionext.com> Reviewed-by: Borislav Petkov <bp@alien8.de> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Cc: Juergen Gross <jgross@suse.com> Cc: Richard Biener <rguenther@suse.de> Cc: Kees Cook <keescook@chromium.org> Cc: Segher Boessenkool <segher@kernel.crashing.org> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Nadav Amit <namit@vmware.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
| * Revert "x86/paravirt: Work around GCC inlining bugs when compiling paravirt ops"Ingo Molnar2018-12-192-28/+29
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This reverts commit 494b5168f2de009eb80f198f668da374295098dd. See this commit for details about the revert: e769742d3584 ("Revert "x86/jump-labels: Macrofy inline assembly code to work around GCC inlining bugs"") Reported-by: Masahiro Yamada <yamada.masahiro@socionext.com> Reviewed-by: Borislav Petkov <bp@alien8.de> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Cc: Juergen Gross <jgross@suse.com> Cc: Richard Biener <rguenther@suse.de> Cc: Kees Cook <keescook@chromium.org> Cc: Segher Boessenkool <segher@kernel.crashing.org> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Nadav Amit <namit@vmware.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
| * Revert "x86/extable: Macrofy inline assembly code to work around GCC ↵Ingo Molnar2018-12-192-21/+33
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | inlining bugs" This reverts commit 0474d5d9d2f7f3b11262f7bf87d0e7314ead9200. See this commit for details about the revert: e769742d3584 ("Revert "x86/jump-labels: Macrofy inline assembly code to work around GCC inlining bugs"") Reported-by: Masahiro Yamada <yamada.masahiro@socionext.com> Reviewed-by: Borislav Petkov <bp@alien8.de> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Cc: Juergen Gross <jgross@suse.com> Cc: Richard Biener <rguenther@suse.de> Cc: Kees Cook <keescook@chromium.org> Cc: Segher Boessenkool <segher@kernel.crashing.org> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Nadav Amit <namit@vmware.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
| * Revert "x86/cpufeature: Macrofy inline assembly code to work around GCC ↵Ingo Molnar2018-12-192-48/+35
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | inlining bugs" This reverts commit d5a581d84ae6b8a4a740464b80d8d9cf1e7947b2. See this commit for details about the revert: e769742d3584 ("Revert "x86/jump-labels: Macrofy inline assembly code to work around GCC inlining bugs"") Reported-by: Masahiro Yamada <yamada.masahiro@socionext.com> Reviewed-by: Borislav Petkov <bp@alien8.de> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Cc: Juergen Gross <jgross@suse.com> Cc: Richard Biener <rguenther@suse.de> Cc: Kees Cook <keescook@chromium.org> Cc: Segher Boessenkool <segher@kernel.crashing.org> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Nadav Amit <namit@vmware.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
| * Revert "x86/jump-labels: Macrofy inline assembly code to work around GCC ↵Ingo Molnar2018-12-193-20/+55
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | inlining bugs" This reverts commit 5bdcd510c2ac9efaf55c4cbd8d46421d8e2320cd. The macro based workarounds for GCC's inlining bugs caused regressions: distcc and other distro build setups broke, and the fixes are not easy nor will they solve regressions on already existing installations. So we are reverting this patch and the 8 followup patches. What makes this revert easier is that GCC9 will likely include the new 'asm inline' syntax that makes inlining of assembly blocks a lot more robust. This is a superior method to any macro based hackeries - and might even be backported to GCC8, which would make all modern distros get the inlining fixes as well. Many thanks to Masahiro Yamada and others for helping sort out these problems. Reported-by: Masahiro Yamada <yamada.masahiro@socionext.com> Reviewed-by: Borislav Petkov <bp@alien8.de> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Cc: Juergen Gross <jgross@suse.com> Cc: Richard Biener <rguenther@suse.de> Cc: Kees Cook <keescook@chromium.org> Cc: Segher Boessenkool <segher@kernel.crashing.org> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Nadav Amit <namit@vmware.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
| * x86/mtrr: Don't copy uninitialized gentry fields back to userspaceColin Ian King2018-12-191-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently the copy_to_user of data in the gentry struct is copying uninitiaized data in field _pad from the stack to userspace. Fix this by explicitly memset'ing gentry to zero, this also will zero any compiler added padding fields that may be in struct (currently there are none). Detected by CoverityScan, CID#200783 ("Uninitialized scalar variable") Fixes: b263b31e8ad6 ("x86, mtrr: Use explicit sizing and padding for the 64-bit ioctls") Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Tyler Hicks <tyhicks@canonical.com> Cc: security@kernel.org Link: https://lkml.kernel.org/r/20181218172956.1440-1-colin.king@canonical.com
| * x86/fsgsbase/64: Fix the base write helper functionsChang S. Bae2018-12-183-52/+71
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Andy spotted a regression in the fs/gs base helpers after the patch series was committed. The helper functions which write fs/gs base are not just writing the base, they are also changing the index. That's wrong and needs to be separated because writing the base has not to modify the index. While the regression is not causing any harm right now because the only caller depends on that behaviour, it's a guarantee for subtle breakage down the road. Make the index explicitly changed from the caller, instead of including the code in the helpers. Subsequently, the task write helpers do not handle for the current task anymore. The range check for a base value is also factored out, to minimize code redundancy from the caller. Fixes: b1378a561fd1 ("x86/fsgsbase/64: Introduce FS/GS base helper functions") Suggested-by: Andy Lutomirski <luto@kernel.org> Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Andy Lutomirski <luto@kernel.org> Cc: "H . Peter Anvin" <hpa@zytor.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Ravi Shankar <ravi.v.shankar@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Link: https://lkml.kernel.org/r/20181126195524.32179-1-chang.seok.bae@intel.com
| * x86/mm/cpa: Fix cpa_flush_array() TLB invalidationPeter Zijlstra2018-12-171-8/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In commit: a7295fd53c39 ("x86/mm/cpa: Use flush_tlb_kernel_range()") I misread the CAP array code and incorrectly used tlb_flush_kernel_range(), resulting in missing TLB flushes and consequent failures. Instead do a full invalidate in this case -- for now. Reported-by: StDenis, Tom <Tom.StDenis@amd.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@surriel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: dave.hansen@intel.com Fixes: a7295fd53c39 ("x86/mm/cpa: Use flush_tlb_kernel_range()") Link: http://lkml.kernel.org/r/20181203171043.089868285@infradead.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
| * x86/vdso: Pass --eh-frame-hdr to the linkerAlistair Strachan2018-12-151-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit 379d98ddf413 ("x86: vdso: Use $LD instead of $CC to link") accidentally broke unwinding from userspace, because ld would strip the .eh_frame sections when linking. Originally, the compiler would implicitly add --eh-frame-hdr when invoking the linker, but when this Makefile was converted from invoking ld via the compiler, to invoking it directly (like vmlinux does), the flag was missed. (The EH_FRAME section is important for the VDSO shared libraries, but not for vmlinux.) Fix the problem by explicitly specifying --eh-frame-hdr, which restores parity with the old method. See relevant bug reports for additional info: https://bugzilla.kernel.org/show_bug.cgi?id=201741 https://bugzilla.redhat.com/show_bug.cgi?id=1659295 Fixes: 379d98ddf413 ("x86: vdso: Use $LD instead of $CC to link") Reported-by: Florian Weimer <fweimer@redhat.com> Reported-by: Carlos O'Donell <carlos@redhat.com> Reported-by: "H. J. Lu" <hjl.tools@gmail.com> Signed-off-by: Alistair Strachan <astrachan@google.com> Signed-off-by: Borislav Petkov <bp@suse.de> Tested-by: Laura Abbott <labbott@redhat.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Carlos O'Donell <carlos@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Joel Fernandes <joel@joelfernandes.org> Cc: kernel-team@android.com Cc: Laura Abbott <labbott@redhat.com> Cc: stable <stable@vger.kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: X86 ML <x86@kernel.org> Link: https://lkml.kernel.org/r/20181214223637.35954-1-astrachan@google.com
| * x86/mm: Fix decoy address handling vs 32-bit buildsDan Williams2018-12-111-2/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | A decoy address is used by set_mce_nospec() to update the cache attributes for a page that may contain poison (multi-bit ECC error) while attempting to minimize the possibility of triggering a speculative access to that page. When reserve_memtype() is handling a decoy address it needs to convert it to its real physical alias. The conversion, AND'ing with __PHYSICAL_MASK, is broken for a 32-bit physical mask and reserve_memtype() is passed the last physical page. Gert reports triggering the: BUG_ON(start >= end); ...assertion when running a 32-bit non-PAE build on a platform that has a driver resource at the top of physical memory: BIOS-e820: [mem 0x00000000fff00000-0x00000000ffffffff] reserved Given that the decoy address scheme is only targeted at 64-bit builds and assumes that the top of physical address space is free for use as a decoy address range, simply bypass address sanitization in the 32-bit case. Lastly, there was no need to crash the system when this failure occurred, and no need to crash future systems if the assumptions of decoy addresses are ever violated. Change the BUG_ON() to a WARN() with an error return. Fixes: 510ee090abc3 ("x86/mm/pat: Prepare {reserve, free}_memtype() for...") Reported-by: Gert Robben <t2@gert.gr> Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Gert Robben <t2@gert.gr> Cc: stable@vger.kernel.org Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Borislav Petkov <bp@alien8.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: platform-driver-x86@vger.kernel.org Cc: <stable@vger.kernel.org> Link: https://lkml.kernel.org/r/154454337985.789277.12133288391664677775.stgit@dwillia2-desk3.amr.corp.intel.com
| * x86/intel_rdt: Ensure a CPU remains online for the region's pseudo-locking ↵Reinette Chatre2018-12-111-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | sequence The user triggers the creation of a pseudo-locked region when writing the requested schemata to the schemata resctrl file. The pseudo-locking of a region is required to be done on a CPU that is associated with the cache on which the pseudo-locked region will reside. In order to run the locking code on a specific CPU, the needed CPU has to be selected and ensured to remain online during the entire locking sequence. At this time, the cpu_hotplug_lock is not taken during the pseudo-lock region creation and it is thus possible for a CPU to be selected to run the pseudo-locking code and then that CPU to go offline before the thread is able to run on it. Fix this by ensuring that the cpu_hotplug_lock is taken while the CPU on which code has to run needs to be controlled. Since the cpu_hotplug_lock is always taken before rdtgroup_mutex the lock order is maintained. Fixes: e0bdfe8e36f3 ("x86/intel_rdt: Support creation/removal of pseudo-locked region") Signed-off-by: Reinette Chatre <reinette.chatre@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tony Luck <tony.luck@intel.com> Cc: gavin.hindman@intel.com Cc: jithu.joseph@intel.com Cc: stable <stable@vger.kernel.org> Cc: x86-ml <x86@kernel.org> Link: https://lkml.kernel.org/r/b7b17432a80f95a1fa21a1698ba643014f58ad31.1544476425.git.reinette.chatre@intel.com
| * x86/dump_pagetables: Fix LDT remap address markerKirill A. Shutemov2018-12-111-5/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The LDT remap placement has been changed. It's now placed before the direct mapping in the kernel virtual address space for both paging modes. Change address markers order accordingly. Fixes: d52888aa2753 ("x86/mm: Move LDT remap out of KASLR region on 5-level paging") Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: bp@alien8.de Cc: hpa@zytor.com Cc: dave.hansen@linux.intel.com Cc: luto@kernel.org Cc: peterz@infradead.org Cc: boris.ostrovsky@oracle.com Cc: jgross@suse.com Cc: bhe@redhat.com Cc: hans.van.kranenburg@mendix.com Cc: linux-mm@kvack.org Cc: xen-devel@lists.xenproject.org Link: https://lkml.kernel.org/r/20181130202328.65359-3-kirill.shutemov@linux.intel.com
| * x86/mm: Fix guard hole handlingKirill A. Shutemov2018-12-113-9/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There is a guard hole at the beginning of the kernel address space, also used by hypervisors. It occupies 16 PGD entries. This reserved range is not defined explicitely, it is calculated relative to other entities: direct mapping and user space ranges. The calculation got broken by recent changes of the kernel memory layout: LDT remap range is now mapped before direct mapping and makes the calculation invalid. The breakage leads to crash on Xen dom0 boot[1]. Define the reserved range explicitely. It's part of kernel ABI (hypervisors expect it to be stable) and must not depend on changes in the rest of kernel memory layout. [1] https://lists.xenproject.org/archives/html/xen-devel/2018-11/msg03313.html Fixes: d52888aa2753 ("x86/mm: Move LDT remap out of KASLR region on 5-level paging") Reported-by: Hans van Kranenburg <hans.van.kranenburg@mendix.com> Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Hans van Kranenburg <hans.van.kranenburg@mendix.com> Reviewed-by: Juergen Gross <jgross@suse.com> Cc: bp@alien8.de Cc: hpa@zytor.com Cc: dave.hansen@linux.intel.com Cc: luto@kernel.org Cc: peterz@infradead.org Cc: boris.ostrovsky@oracle.com Cc: bhe@redhat.com Cc: linux-mm@kvack.org Cc: xen-devel@lists.xenproject.org Link: https://lkml.kernel.org/r/20181130202328.65359-2-kirill.shutemov@linux.intel.com
* | kvm: x86: Add AMD's EX_CFG to the list of ignored MSRsEduardo Habkost2018-12-182-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Some guests OSes (including Windows 10) write to MSR 0xc001102c on some cases (possibly while trying to apply a CPU errata). Make KVM ignore reads and writes to that MSR, so the guest won't crash. The MSR is documented as "Execution Unit Configuration (EX_CFG)", at AMD's "BIOS and Kernel Developer's Guide (BKDG) for AMD Family 15h Models 00h-0Fh Processors". Cc: stable@vger.kernel.org Signed-off-by: Eduardo Habkost <ehabkost@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* | KVM: X86: Fix NULL deref in vcpu_scan_ioapicWanpeng Li2018-12-181-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Reported by syzkaller: CPU: 1 PID: 5962 Comm: syz-executor118 Not tainted 4.20.0-rc6+ #374 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 RIP: 0010:kvm_apic_hw_enabled arch/x86/kvm/lapic.h:169 [inline] RIP: 0010:vcpu_scan_ioapic arch/x86/kvm/x86.c:7449 [inline] RIP: 0010:vcpu_enter_guest arch/x86/kvm/x86.c:7602 [inline] RIP: 0010:vcpu_run arch/x86/kvm/x86.c:7874 [inline] RIP: 0010:kvm_arch_vcpu_ioctl_run+0x5296/0x7320 arch/x86/kvm/x86.c:8074 Call Trace: kvm_vcpu_ioctl+0x5c8/0x1150 arch/x86/kvm/../../../virt/kvm/kvm_main.c:2596 vfs_ioctl fs/ioctl.c:46 [inline] file_ioctl fs/ioctl.c:509 [inline] do_vfs_ioctl+0x1de/0x1790 fs/ioctl.c:696 ksys_ioctl+0xa9/0xd0 fs/ioctl.c:713 __do_sys_ioctl fs/ioctl.c:720 [inline] __se_sys_ioctl fs/ioctl.c:718 [inline] __x64_sys_ioctl+0x73/0xb0 fs/ioctl.c:718 do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290 entry_SYSCALL_64_after_hwframe+0x49/0xbe The reason is that the testcase writes hyperv synic HV_X64_MSR_SINT14 msr and triggers scan ioapic logic to load synic vectors into EOI exit bitmap. However, irqchip is not initialized by this simple testcase, ioapic/apic objects should not be accessed. This patch fixes it by also considering whether or not apic is present. Reported-by: syzbot+39810e6c400efadfef71@syzkaller.appspotmail.com Cc: stable@vger.kernel.org Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Wanpeng Li <wanpengli@tencent.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* | KVM: Fix UAF in nested posted interrupt processingCfir Cohen2018-12-181-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | nested_get_vmcs12_pages() processes the posted_intr address in vmcs12. It caches the kmap()ed page object and pointer, however, it doesn't handle errors correctly: it's possible to cache a valid pointer, then release the page and later dereference the dangling pointer. I was able to reproduce with the following steps: 1. Call vmlaunch with valid posted_intr_desc_addr but an invalid MSR_EFER. This causes nested_get_vmcs12_pages() to cache the kmap()ed pi_desc_page and pi_desc. Later the invalid EFER value fails check_vmentry_postreqs() which fails the first vmlaunch. 2. Call vmlanuch with a valid EFER but an invalid posted_intr_desc_addr (I set it to 2G - 0x80). The second time we call nested_get_vmcs12_pages pi_desc_page is unmapped and released and pi_desc_page is set to NULL (the "shouldn't happen" clause). Due to the invalid posted_intr_desc_addr, kvm_vcpu_gpa_to_page() fails and nested_get_vmcs12_pages() returns. It doesn't return an error value so vmlaunch proceeds. Note that at this time we have a dangling pointer in vmx->nested.pi_desc and POSTED_INTR_DESC_ADDR in L0's vmcs. 3. Issue an IPI in L2 guest code. This triggers a call to vmx_complete_nested_posted_interrupt() and pi_test_and_clear_on() which dereferences the dangling pointer. Vulnerable code requires nested and enable_apicv variables to be set to true. The host CPU must also support posted interrupts. Fixes: 5e2f30b756a37 "KVM: nVMX: get rid of nested_get_page()" Cc: stable@vger.kernel.org Reviewed-by: Andy Honig <ahonig@google.com> Signed-off-by: Cfir Cohen <cfir@google.com> Reviewed-by: Liran Alon <liran.alon@oracle.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* | Merge branch 'x86-urgent-for-linus' of ↵Linus Torvalds2018-12-093-5/+10
|\| | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Ingo Molnar: "Three fixes: a boot parameter re-(re-)fix, a retpoline build artifact fix and an LLVM workaround" * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/vdso: Drop implicit common-page-size linker flag x86/build: Fix compiler support check for CONFIG_RETPOLINE x86/boot: Clear RSDP address in boot_params for broken loaders
| * x86/vdso: Drop implicit common-page-size linker flagNick Desaulniers2018-12-071-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | GNU linker's -z common-page-size's default value is based on the target architecture. arch/x86/entry/vdso/Makefile sets it to the architecture default, which is implicit and redundant. Drop it. Fixes: 2aae950b21e4 ("x86_64: Add vDSO for x86-64 with gettimeofday/clock_gettime/getcpu") Reported-by: Dmitry Golovin <dima@golovin.in> Reported-by: Bill Wendling <morbo@google.com> Suggested-by: Dmitry Golovin <dima@golovin.in> Suggested-by: Rui Ueyama <ruiu@google.com> Signed-off-by: Nick Desaulniers <ndesaulniers@google.com> Signed-off-by: Borislav Petkov <bp@suse.de> Acked-by: Andy Lutomirski <luto@kernel.org> Cc: Andi Kleen <andi@firstfloor.org> Cc: Fangrui Song <maskray@google.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: x86-ml <x86@kernel.org> Link: https://lkml.kernel.org/r/20181206191231.192355-1-ndesaulniers@google.com Link: https://bugs.llvm.org/show_bug.cgi?id=38774 Link: https://github.com/ClangBuiltLinux/linux/issues/31
| * x86/build: Fix compiler support check for CONFIG_RETPOLINEMasahiro Yamada2018-12-051-3/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | It is troublesome to add a diagnostic like this to the Makefile parse stage because the top-level Makefile could be parsed with a stale include/config/auto.conf. Once you are hit by the error about non-retpoline compiler, the compilation still breaks even after disabling CONFIG_RETPOLINE. The easiest fix is to move this check to the "archprepare" like this commit did: 829fe4aa9ac1 ("x86: Allow generating user-space headers without a compiler") Reported-by: Meelis Roos <mroos@linux.ee> Tested-by: Meelis Roos <mroos@linux.ee> Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com> Acked-by: Zhenzhong Duan <zhenzhong.duan@oracle.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Zhenzhong Duan <zhenzhong.duan@oracle.com> Fixes: 4cd24de3a098 ("x86/retpoline: Make CONFIG_RETPOLINE depend on compiler support") Link: http://lkml.kernel.org/r/1543991239-18476-1-git-send-email-yamada.masahiro@socionext.com Link: https://lkml.org/lkml/2018/12/4/206 Signed-off-by: Ingo Molnar <mingo@kernel.org>
| * x86/boot: Clear RSDP address in boot_params for broken loadersJuergen Gross2018-12-031-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Gunnar Krueger reported a systemd-boot failure and bisected it down to: e6e094e053af75 ("x86/acpi, x86/boot: Take RSDP address from boot params if available") In case a broken boot loader doesn't clear its 'struct boot_params', clear rsdp_addr in sanitize_boot_params(). Reported-by: Gunnar Krueger <taijian@posteo.de> Tested-by: Gunnar Krueger <taijian@posteo.de> Signed-off-by: Juergen Gross <jgross@suse.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: bp@alien8.de Cc: sstabellini@kernel.org Fixes: e6e094e053af75 ("x86/acpi, x86/boot: Take RSDP address from boot params if available") Link: http://lkml.kernel.org/r/20181203103811.17056-1-jgross@suse.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
* | Merge branch 'perf-urgent-for-linus' of ↵Linus Torvalds2018-12-092-1/+5
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull kprobes fixes from Ingo Molnar: "Two kprobes fixes: a blacklist fix and an instruction patching related corruption fix" * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: kprobes/x86: Blacklist non-attachable interrupt functions kprobes/x86: Fix instruction patching corruption when copying more than one RIP-relative instruction
| * | kprobes/x86: Blacklist non-attachable interrupt functionsAndrea Righi2018-12-061-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | These interrupt functions are already non-attachable by kprobes. Blacklist them explicitly so that they can show up in /sys/kernel/debug/kprobes/blacklist and tools like BCC can use this additional information. Signed-off-by: Andrea Righi <righi.andrea@gmail.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com> Cc: Borislav Petkov <bp@alien8.de> Cc: David S. Miller <davem@davemloft.net> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Yonghong Song <yhs@fb.com> Link: http://lkml.kernel.org/r/20181206095648.GA8249@Dell Signed-off-by: Ingo Molnar <mingo@kernel.org>
| * | kprobes/x86: Fix instruction patching corruption when copying more than one ↵Masami Hiramatsu2018-12-041-1/+1
| |/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | RIP-relative instruction After copy_optimized_instructions() copies several instructions to the working buffer it tries to fix up the real RIP address, but it adjusts the RIP-relative instruction with an incorrect RIP address for the 2nd and subsequent instructions due to a bug in the logic. This will break the kernel pretty badly (with likely outcomes such as a kernel freeze, a crash, or worse) because probed instructions can refer to the wrong data. For example putting kprobes on cpumask_next() typically hits this bug. cpumask_next() is normally like below if CONFIG_CPUMASK_OFFSTACK=y (in this case nr_cpumask_bits is an alias of nr_cpu_ids): <cpumask_next>: 48 89 f0 mov %rsi,%rax 8b 35 7b fb e2 00 mov 0xe2fb7b(%rip),%esi # ffffffff82db9e64 <nr_cpu_ids> 55 push %rbp ... If we put a kprobe on it and it gets jump-optimized, it gets patched by the kprobes code like this: <cpumask_next>: e9 95 7d 07 1e jmpq 0xffffffffa000207a 7b fb jnp 0xffffffff81f8a2e2 <cpumask_next+2> e2 00 loop 0xffffffff81f8a2e9 <cpumask_next+9> 55 push %rbp This shows that the first two MOV instructions were copied to a trampoline buffer at 0xffffffffa000207a. Here is the disassembled result of the trampoline, skipping the optprobe template instructions: # Dump of assembly code from 0xffffffffa000207a to 0xffffffffa00020ea: 54 push %rsp ... 48 83 c4 08 add $0x8,%rsp 9d popfq 48 89 f0 mov %rsi,%rax 8b 35 82 7d db e2 mov -0x1d24827e(%rip),%esi # 0xffffffff82db9e67 <nr_cpu_ids+3> This dump shows that the second MOV accesses *(nr_cpu_ids+3) instead of the original *nr_cpu_ids. This leads to a kernel freeze because cpumask_next() always returns 0 and for_each_cpu() never ends. Fix this by adding 'len' correctly to the real RIP address while copying. [ mingo: Improved the changelog. ] Reported-by: Michael Rodin <michael@rodin.online> Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Reviewed-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@linux.ibm.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: stable@vger.kernel.org # v4.15+ Fixes: 63fef14fc98a ("kprobes/x86: Make insn buffer always ROX and use text_poke()") Link: http://lkml.kernel.org/r/153504457253.22602.1314289671019919596.stgit@devbox Signed-off-by: Ingo Molnar <mingo@kernel.org>
* | Merge branch 'efi-urgent-for-linus' of ↵Linus Torvalds2018-12-092-25/+42
|\ \ | |/ |/| | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull EFI fixes from Ingo Molnar: "Two fixes: a large-system fix and an earlyprintk fix with certain resolutions" * 'efi-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/earlyprintk/efi: Fix infinite loop on some screen widths x86/efi: Allocate e820 buffer before calling efi_exit_boot_service
| * x86/earlyprintk/efi: Fix infinite loop on some screen widthsYiFei Zhu2018-11-301-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | An affected screen resolution is 1366 x 768, which width is not divisible by 8, the default font width. On such screens, when longer lines are earlyprintk'ed, overflow-to-next-line can never trigger, due to the left-most x-coordinate of the next character always less than the screen width. Earlyprintk will infinite loop in trying to print the rest of the string but unable to, due to the line being full. This patch makes the trigger consider the right-most x-coordinate, instead of left-most, as the value to compare against the screen width threshold. Signed-off-by: YiFei Zhu <zhuyifei1999@gmail.com> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Arend van Spriel <arend.vanspriel@broadcom.com> Cc: Bhupesh Sharma <bhsharma@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Eric Snowberg <eric.snowberg@oracle.com> Cc: Hans de Goede <hdegoede@redhat.com> Cc: Joe Perches <joe@perches.com> Cc: Jon Hunter <jonathanh@nvidia.com> Cc: Julien Thierry <julien.thierry@arm.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Marc Zyngier <marc.zyngier@arm.com> Cc: Matt Fleming <matt@codeblueprint.co.uk> Cc: Nathan Chancellor <natechancellor@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Sai Praneeth Prakhya <sai.praneeth.prakhya@intel.com> Cc: Sedat Dilek <sedat.dilek@gmail.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-efi@vger.kernel.org Link: http://lkml.kernel.org/r/20181129171230.18699-12-ard.biesheuvel@linaro.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
| * x86/efi: Allocate e820 buffer before calling efi_exit_boot_serviceEric Snowberg2018-11-301-24/+41
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The following commit: d64934019f6c ("x86/efi: Use efi_exit_boot_services()") introduced a regression on systems with large memory maps causing them to hang on boot. The first "goto get_map" that was removed from exit_boot() ensured there was enough room for the memory map when efi_call_early(exit_boot_services) was called. This happens when (nr_desc > ARRAY_SIZE(params->e820_table). Chain of events: exit_boot() efi_exit_boot_services() efi_get_memory_map <- at this point the mm can't grow over 8 desc priv_func() exit_boot_func() allocate_e820ext() <- new mm grows over 8 desc from e820 alloc efi_call_early(exit_boot_services) <- mm key doesn't match so retry efi_call_early(get_memory_map) <- not enough room for new mm system hangs This patch allocates the e820 buffer before calling efi_exit_boot_services() and fixes the regression. [ mingo: minor cleanliness edits. ] Signed-off-by: Eric Snowberg <eric.snowberg@oracle.com> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: <stable@vger.kernel.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Arend van Spriel <arend.vanspriel@broadcom.com> Cc: Bhupesh Sharma <bhsharma@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Hans de Goede <hdegoede@redhat.com> Cc: Joe Perches <joe@perches.com> Cc: Jon Hunter <jonathanh@nvidia.com> Cc: Julien Thierry <julien.thierry@arm.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Marc Zyngier <marc.zyngier@arm.com> Cc: Matt Fleming <matt@codeblueprint.co.uk> Cc: Nathan Chancellor <natechancellor@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Sai Praneeth Prakhya <sai.praneeth.prakhya@intel.com> Cc: Sedat Dilek <sedat.dilek@gmail.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: YiFei Zhu <zhuyifei1999@gmail.com> Cc: linux-efi@vger.kernel.org Link: http://lkml.kernel.org/r/20181129171230.18699-2-ard.biesheuvel@linaro.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
* | Merge tag 'for-linus-4.20a-rc5-tag' of ↵Linus Torvalds2018-12-024-101/+25
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip Pull xen fixes from Juergen Gross: - A revert of a previous commit as it is no longer necessary and has shown to cause problems in some memory hotplug cases. - Some small fixes and a minor cleanup. - A patch for adding better diagnostic data in a very rare failure case. * tag 'for-linus-4.20a-rc5-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip: pvcalls-front: fixes incorrect error handling Revert "xen/balloon: Mark unallocated host memory as UNUSABLE" xen: xlate_mmu: add missing header to fix 'W=1' warning xen/x86: add diagnostic printout to xen_mc_flush() in case of error x86/xen: cleanup includes in arch/x86/xen/spinlock.c
| * | Revert "xen/balloon: Mark unallocated host memory as UNUSABLE"Igor Druzhinin2018-11-292-80/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This reverts commit b3cf8528bb21febb650a7ecbf080d0647be40b9f. That commit unintentionally broke Xen balloon memory hotplug with "hotplug_unpopulated" set to 1. As long as "System RAM" resource got assigned under a new "Unusable memory" resource in IO/Mem tree any attempt to online this memory would fail due to general kernel restrictions on having "System RAM" resources as 1st level only. The original issue that commit has tried to workaround fa564ad96366 ("x86/PCI: Enable a 64bit BAR on AMD Family 15h (Models 00-1f, 30-3f, 60-7f)") also got amended by the following 03a551734 ("x86/PCI: Move and shrink AMD 64-bit window to avoid conflict") which made the original fix to Xen ballooning unnecessary. Signed-off-by: Igor Druzhinin <igor.druzhinin@citrix.com> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Signed-off-by: Juergen Gross <jgross@suse.com>
| * | xen/x86: add diagnostic printout to xen_mc_flush() in case of errorJuergen Gross2018-11-291-15/+20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Failure of an element of a Xen multicall is signalled via a WARN() only if the kernel is compiled with MC_DEBUG. It is impossible to know which element failed and why it did so. Change that by printing the related information even without MC_DEBUG, even if maybe in some limited form (e.g. without information which caller produced the failing element). Move the printing out of the switch statement in order to have the same information for a single call. Signed-off-by: Juergen Gross <jgross@suse.com> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Signed-off-by: Juergen Gross <jgross@suse.com>
| * | x86/xen: cleanup includes in arch/x86/xen/spinlock.cJuergen Gross2018-11-221-6/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | arch/x86/xen/spinlock.c includes several headers which are not needed. Remove the #includes. Signed-off-by: Juergen Gross <jgross@suse.com> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Signed-off-by: Juergen Gross <jgross@suse.com>
* | | Merge branch 'x86-pti-for-linus' of ↵Linus Torvalds2018-12-0114-234/+665
|\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull STIBP fallout fixes from Thomas Gleixner: "The performance destruction department finally got it's act together and came up with a cure for the STIPB regression: - Provide a command line option to control the spectre v2 user space mitigations. Default is either seccomp or prctl (if seccomp is disabled in Kconfig). prctl allows mitigation opt-in, seccomp enables the migitation for sandboxed processes. - Rework the code to handle the conditional STIBP/IBPB control and remove the now unused ptrace_may_access_sched() optimization attempt - Disable STIBP automatically when SMT is disabled - Optimize the switch_to() logic to avoid MSR writes and invocations of __switch_to_xtra(). - Make the asynchronous speculation TIF updates synchronous to prevent stale mitigation state. As a general cleanup this also makes retpoline directly depend on compiler support and removes the 'minimal retpoline' option which just pretended to provide some form of security while providing none" * 'x86-pti-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (31 commits) x86/speculation: Provide IBPB always command line options x86/speculation: Add seccomp Spectre v2 user space protection mode x86/speculation: Enable prctl mode for spectre_v2_user x86/speculation: Add prctl() control for indirect branch speculation x86/speculation: Prepare arch_smt_update() for PRCTL mode x86/speculation: Prevent stale SPEC_CTRL msr content x86/speculation: Split out TIF update ptrace: Remove unused ptrace_may_access_sched() and MODE_IBRS x86/speculation: Prepare for conditional IBPB in switch_mm() x86/speculation: Avoid __switch_to_xtra() calls x86/process: Consolidate and simplify switch_to_xtra() code x86/speculation: Prepare for per task indirect branch speculation control x86/speculation: Add command line control for indirect branch speculation x86/speculation: Unify conditional spectre v2 print functions x86/speculataion: Mark command line parser data __initdata x86/speculation: Mark string arrays const correctly x86/speculation: Reorder the spec_v2 code x86/l1tf: Show actual SMT state x86/speculation: Rework SMT state change sched/smt: Expose sched_smt_present static key ...
| * | | x86/speculation: Provide IBPB always command line optionsThomas Gleixner2018-11-281-11/+23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Provide the possibility to enable IBPB always in combination with 'prctl' and 'seccomp'. Add the extra command line options and rework the IBPB selection to evaluate the command instead of the mode selected by the STIPB switch case. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Ingo Molnar <mingo@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Jiri Kosina <jkosina@suse.cz> Cc: Tom Lendacky <thomas.lendacky@amd.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: David Woodhouse <dwmw@amazon.co.uk> Cc: Tim Chen <tim.c.chen@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Casey Schaufler <casey.schaufler@intel.com> Cc: Asit Mallick <asit.k.mallick@intel.com> Cc: Arjan van de Ven <arjan@linux.intel.com> Cc: Jon Masters <jcm@redhat.com> Cc: Waiman Long <longman9394@gmail.com> Cc: Greg KH <gregkh@linuxfoundation.org> Cc: Dave Stewart <david.c.stewart@intel.com> Cc: Kees Cook <keescook@chromium.org> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20181125185006.144047038@linutronix.de
| * | | x86/speculation: Add seccomp Spectre v2 user space protection modeThomas Gleixner2018-11-282-1/+17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If 'prctl' mode of user space protection from spectre v2 is selected on the kernel command-line, STIBP and IBPB are applied on tasks which restrict their indirect branch speculation via prctl. SECCOMP enables the SSBD mitigation for sandboxed tasks already, so it makes sense to prevent spectre v2 user space to user space attacks as well. The Intel mitigation guide documents how STIPB works: Setting bit 1 (STIBP) of the IA32_SPEC_CTRL MSR on a logical processor prevents the predicted targets of indirect branches on any logical processor of that core from being controlled by software that executes (or executed previously) on another logical processor of the same core. Ergo setting STIBP protects the task itself from being attacked from a task running on a different hyper-thread and protects the tasks running on different hyper-threads from being attacked. While the document suggests that the branch predictors are shielded between the logical processors, the observed performance regressions suggest that STIBP simply disables the branch predictor more or less completely. Of course the document wording is vague, but the fact that there is also no requirement for issuing IBPB when STIBP is used points clearly in that direction. The kernel still issues IBPB even when STIBP is used until Intel clarifies the whole mechanism. IBPB is issued when the task switches out, so malicious sandbox code cannot mistrain the branch predictor for the next user space task on the same logical processor. Signed-off-by: Jiri Kosina <jkosina@suse.cz> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Ingo Molnar <mingo@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Tom Lendacky <thomas.lendacky@amd.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: David Woodhouse <dwmw@amazon.co.uk> Cc: Tim Chen <tim.c.chen@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Casey Schaufler <casey.schaufler@intel.com> Cc: Asit Mallick <asit.k.mallick@intel.com> Cc: Arjan van de Ven <arjan@linux.intel.com> Cc: Jon Masters <jcm@redhat.com> Cc: Waiman Long <longman9394@gmail.com> Cc: Greg KH <gregkh@linuxfoundation.org> Cc: Dave Stewart <david.c.stewart@intel.com> Cc: Kees Cook <keescook@chromium.org> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20181125185006.051663132@linutronix.de
| * | | x86/speculation: Enable prctl mode for spectre_v2_userThomas Gleixner2018-11-281-9/+32
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Now that all prerequisites are in place: - Add the prctl command line option - Default the 'auto' mode to 'prctl' - When SMT state changes, update the static key which controls the conditional STIBP evaluation on context switch. - At init update the static key which controls the conditional IBPB evaluation on context switch. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Ingo Molnar <mingo@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Jiri Kosina <jkosina@suse.cz> Cc: Tom Lendacky <thomas.lendacky@amd.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: David Woodhouse <dwmw@amazon.co.uk> Cc: Tim Chen <tim.c.chen@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Casey Schaufler <casey.schaufler@intel.com> Cc: Asit Mallick <asit.k.mallick@intel.com> Cc: Arjan van de Ven <arjan@linux.intel.com> Cc: Jon Masters <jcm@redhat.com> Cc: Waiman Long <longman9394@gmail.com> Cc: Greg KH <gregkh@linuxfoundation.org> Cc: Dave Stewart <david.c.stewart@intel.com> Cc: Kees Cook <keescook@chromium.org> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20181125185005.958421388@linutronix.de
| * | | x86/speculation: Add prctl() control for indirect branch speculationThomas Gleixner2018-11-283-0/+73
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add the PR_SPEC_INDIRECT_BRANCH option for the PR_GET_SPECULATION_CTRL and PR_SET_SPECULATION_CTRL prctls to allow fine grained per task control of indirect branch speculation via STIBP and IBPB. Invocations: Check indirect branch speculation status with - prctl(PR_GET_SPECULATION_CTRL, PR_SPEC_INDIRECT_BRANCH, 0, 0, 0); Enable indirect branch speculation with - prctl(PR_SET_SPECULATION_CTRL, PR_SPEC_INDIRECT_BRANCH, PR_SPEC_ENABLE, 0, 0); Disable indirect branch speculation with - prctl(PR_SET_SPECULATION_CTRL, PR_SPEC_INDIRECT_BRANCH, PR_SPEC_DISABLE, 0, 0); Force disable indirect branch speculation with - prctl(PR_SET_SPECULATION_CTRL, PR_SPEC_INDIRECT_BRANCH, PR_SPEC_FORCE_DISABLE, 0, 0); See Documentation/userspace-api/spec_ctrl.rst. Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Ingo Molnar <mingo@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Jiri Kosina <jkosina@suse.cz> Cc: Tom Lendacky <thomas.lendacky@amd.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: David Woodhouse <dwmw@amazon.co.uk> Cc: Andi Kleen <ak@linux.intel.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Casey Schaufler <casey.schaufler@intel.com> Cc: Asit Mallick <asit.k.mallick@intel.com> Cc: Arjan van de Ven <arjan@linux.intel.com> Cc: Jon Masters <jcm@redhat.com> Cc: Waiman Long <longman9394@gmail.com> Cc: Greg KH <gregkh@linuxfoundation.org> Cc: Dave Stewart <david.c.stewart@intel.com> Cc: Kees Cook <keescook@chromium.org> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20181125185005.866780996@linutronix.de
| * | | x86/speculation: Prepare arch_smt_update() for PRCTL modeThomas Gleixner2018-11-281-21/+25
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The upcoming fine grained per task STIBP control needs to be updated on CPU hotplug as well. Split out the code which controls the strict mode so the prctl control code can be added later. Mark the SMP function call argument __unused while at it. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Ingo Molnar <mingo@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Jiri Kosina <jkosina@suse.cz> Cc: Tom Lendacky <thomas.lendacky@amd.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: David Woodhouse <dwmw@amazon.co.uk> Cc: Tim Chen <tim.c.chen@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Casey Schaufler <casey.schaufler@intel.com> Cc: Asit Mallick <asit.k.mallick@intel.com> Cc: Arjan van de Ven <arjan@linux.intel.com> Cc: Jon Masters <jcm@redhat.com> Cc: Waiman Long <longman9394@gmail.com> Cc: Greg KH <gregkh@linuxfoundation.org> Cc: Dave Stewart <david.c.stewart@intel.com> Cc: Kees Cook <keescook@chromium.org> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20181125185005.759457117@linutronix.de
| * | | x86/speculation: Prevent stale SPEC_CTRL msr contentThomas Gleixner2018-11-284-18/+40
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The seccomp speculation control operates on all tasks of a process, but only the current task of a process can update the MSR immediately. For the other threads the update is deferred to the next context switch. This creates the following situation with Process A and B: Process A task 2 and Process B task 1 are pinned on CPU1. Process A task 2 does not have the speculation control TIF bit set. Process B task 1 has the speculation control TIF bit set. CPU0 CPU1 MSR bit is set ProcB.T1 schedules out ProcA.T2 schedules in MSR bit is cleared ProcA.T1 seccomp_update() set TIF bit on ProcA.T2 ProcB.T1 schedules in MSR is not updated <-- FAIL This happens because the context switch code tries to avoid the MSR update if the speculation control TIF bits of the incoming and the outgoing task are the same. In the worst case ProcB.T1 and ProcA.T2 are the only tasks scheduling back and forth on CPU1, which keeps the MSR stale forever. In theory this could be remedied by IPIs, but chasing the remote task which could be migrated is complex and full of races. The straight forward solution is to avoid the asychronous update of the TIF bit and defer it to the next context switch. The speculation control state is stored in task_struct::atomic_flags by the prctl and seccomp updates already. Add a new TIF_SPEC_FORCE_UPDATE bit and set this after updating the atomic_flags. Check the bit on context switch and force a synchronous update of the speculation control if set. Use the same mechanism for updating the current task. Reported-by: Tim Chen <tim.c.chen@linux.intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Jiri Kosina <jkosina@suse.cz> Cc: Tom Lendacky <thomas.lendacky@amd.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: David Woodhouse <dwmw@amazon.co.uk> Cc: Tim Chen <tim.c.chen@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Casey Schaufler <casey.schaufler@intel.com> Cc: Asit Mallick <asit.k.mallick@intel.com> Cc: Arjan van de Ven <arjan@linux.intel.com> Cc: Jon Masters <jcm@redhat.com> Cc: Waiman Long <longman9394@gmail.com> Cc: Greg KH <gregkh@linuxfoundation.org> Cc: Dave Stewart <david.c.stewart@intel.com> Cc: Kees Cook <keescook@chromium.org> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/alpine.DEB.2.21.1811272247140.1875@nanos.tec.linutronix.de
| * | | x86/speculation: Split out TIF updateThomas Gleixner2018-11-281-12/+23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The update of the TIF_SSBD flag and the conditional speculation control MSR update is done in the ssb_prctl_set() function directly. The upcoming prctl support for controlling indirect branch speculation via STIBP needs the same mechanism. Split the code out and make it reusable. Reword the comment about updates for other tasks. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Ingo Molnar <mingo@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Jiri Kosina <jkosina@suse.cz> Cc: Tom Lendacky <thomas.lendacky@amd.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: David Woodhouse <dwmw@amazon.co.uk> Cc: Tim Chen <tim.c.chen@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Casey Schaufler <casey.schaufler@intel.com> Cc: Asit Mallick <asit.k.mallick@intel.com> Cc: Arjan van de Ven <arjan@linux.intel.com> Cc: Jon Masters <jcm@redhat.com> Cc: Waiman Long <longman9394@gmail.com> Cc: Greg KH <gregkh@linuxfoundation.org> Cc: Dave Stewart <david.c.stewart@intel.com> Cc: Kees Cook <keescook@chromium.org> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20181125185005.652305076@linutronix.de
| * | | x86/speculation: Prepare for conditional IBPB in switch_mm()Thomas Gleixner2018-11-284-36/+118
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The IBPB speculation barrier is issued from switch_mm() when the kernel switches to a user space task with a different mm than the user space task which ran last on the same CPU. An additional optimization is to avoid IBPB when the incoming task can be ptraced by the outgoing task. This optimization only works when switching directly between two user space tasks. When switching from a kernel task to a user space task the optimization fails because the previous task cannot be accessed anymore. So for quite some scenarios the optimization is just adding overhead. The upcoming conditional IBPB support will issue IBPB only for user space tasks which have the TIF_SPEC_IB bit set. This requires to handle the following cases: 1) Switch from a user space task (potential attacker) which has TIF_SPEC_IB set to a user space task (potential victim) which has TIF_SPEC_IB not set. 2) Switch from a user space task (potential attacker) which has TIF_SPEC_IB not set to a user space task (potential victim) which has TIF_SPEC_IB set. This needs to be optimized for the case where the IBPB can be avoided when only kernel threads ran in between user space tasks which belong to the same process. The current check whether two tasks belong to the same context is using the tasks context id. While correct, it's simpler to use the mm pointer because it allows to mangle the TIF_SPEC_IB bit into it. The context id based mechanism requires extra storage, which creates worse code. When a task is scheduled out its TIF_SPEC_IB bit is mangled as bit 0 into the per CPU storage which is used to track the last user space mm which was running on a CPU. This bit can be used together with the TIF_SPEC_IB bit of the incoming task to make the decision whether IBPB needs to be issued or not to cover the two cases above. As conditional IBPB is going to be the default, remove the dubious ptrace check for the IBPB always case and simply issue IBPB always when the process changes. Move the storage to a different place in the struct as the original one created a hole. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Ingo Molnar <mingo@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Jiri Kosina <jkosina@suse.cz> Cc: Tom Lendacky <thomas.lendacky@amd.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: David Woodhouse <dwmw@amazon.co.uk> Cc: Tim Chen <tim.c.chen@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Casey Schaufler <casey.schaufler@intel.com> Cc: Asit Mallick <asit.k.mallick@intel.com> Cc: Arjan van de Ven <arjan@linux.intel.com> Cc: Jon Masters <jcm@redhat.com> Cc: Waiman Long <longman9394@gmail.com> Cc: Greg KH <gregkh@linuxfoundation.org> Cc: Dave Stewart <david.c.stewart@intel.com> Cc: Kees Cook <keescook@chromium.org> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20181125185005.466447057@linutronix.de
| * | | x86/speculation: Avoid __switch_to_xtra() callsThomas Gleixner2018-11-282-2/+26
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The TIF_SPEC_IB bit does not need to be evaluated in the decision to invoke __switch_to_xtra() when: - CONFIG_SMP is disabled - The conditional STIPB mode is disabled The TIF_SPEC_IB bit still controls IBPB in both cases so the TIF work mask checks might invoke __switch_to_xtra() for nothing if TIF_SPEC_IB is the only set bit in the work masks. Optimize it out by masking the bit at compile time for CONFIG_SMP=n and at run time when the static key controlling the conditional STIBP mode is disabled. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Ingo Molnar <mingo@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Jiri Kosina <jkosina@suse.cz> Cc: Tom Lendacky <thomas.lendacky@amd.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: David Woodhouse <dwmw@amazon.co.uk> Cc: Tim Chen <tim.c.chen@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Casey Schaufler <casey.schaufler@intel.com> Cc: Asit Mallick <asit.k.mallick@intel.com> Cc: Arjan van de Ven <arjan@linux.intel.com> Cc: Jon Masters <jcm@redhat.com> Cc: Waiman Long <longman9394@gmail.com> Cc: Greg KH <gregkh@linuxfoundation.org> Cc: Dave Stewart <david.c.stewart@intel.com> Cc: Kees Cook <keescook@chromium.org> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20181125185005.374062201@linutronix.de
| * | | x86/process: Consolidate and simplify switch_to_xtra() codeThomas Gleixner2018-11-285-22/+37
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Move the conditional invocation of __switch_to_xtra() into an inline function so the logic can be shared between 32 and 64 bit. Remove the handthrough of the TSS pointer and retrieve the pointer directly in the bitmap handling function. Use this_cpu_ptr() instead of the per_cpu() indirection. This is a preparatory change so integration of conditional indirect branch speculation optimization happens only in one place. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Ingo Molnar <mingo@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Jiri Kosina <jkosina@suse.cz> Cc: Tom Lendacky <thomas.lendacky@amd.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: David Woodhouse <dwmw@amazon.co.uk> Cc: Tim Chen <tim.c.chen@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Casey Schaufler <casey.schaufler@intel.com> Cc: Asit Mallick <asit.k.mallick@intel.com> Cc: Arjan van de Ven <arjan@linux.intel.com> Cc: Jon Masters <jcm@redhat.com> Cc: Waiman Long <longman9394@gmail.com> Cc: Greg KH <gregkh@linuxfoundation.org> Cc: Dave Stewart <david.c.stewart@intel.com> Cc: Kees Cook <keescook@chromium.org> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20181125185005.280855518@linutronix.de
| * | | x86/speculation: Prepare for per task indirect branch speculation controlTim Chen2018-11-285-5/+41
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | To avoid the overhead of STIBP always on, it's necessary to allow per task control of STIBP. Add a new task flag TIF_SPEC_IB and evaluate it during context switch if SMT is active and flag evaluation is enabled by the speculation control code. Add the conditional evaluation to x86_virt_spec_ctrl() as well so the guest/host switch works properly. This has no effect because TIF_SPEC_IB cannot be set yet and the static key which controls evaluation is off. Preparatory patch for adding the control code. [ tglx: Simplify the context switch logic and make the TIF evaluation depend on SMP=y and on the static key controlling the conditional update. Rename it to TIF_SPEC_IB because it controls both STIBP and IBPB ] Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Ingo Molnar <mingo@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Jiri Kosina <jkosina@suse.cz> Cc: Tom Lendacky <thomas.lendacky@amd.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: David Woodhouse <dwmw@amazon.co.uk> Cc: Andi Kleen <ak@linux.intel.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Casey Schaufler <casey.schaufler@intel.com> Cc: Asit Mallick <asit.k.mallick@intel.com> Cc: Arjan van de Ven <arjan@linux.intel.com> Cc: Jon Masters <jcm@redhat.com> Cc: Waiman Long <longman9394@gmail.com> Cc: Greg KH <gregkh@linuxfoundation.org> Cc: Dave Stewart <david.c.stewart@intel.com> Cc: Kees Cook <keescook@chromium.org> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20181125185005.176917199@linutronix.de
| * | | x86/speculation: Add command line control for indirect branch speculationThomas Gleixner2018-11-282-17/+126
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add command line control for user space indirect branch speculation mitigations. The new option is: spectre_v2_user= The initial options are: - on: Unconditionally enabled - off: Unconditionally disabled -auto: Kernel selects mitigation (default off for now) When the spectre_v2= command line argument is either 'on' or 'off' this implies that the application to application control follows that state even if a contradicting spectre_v2_user= argument is supplied. Originally-by: Tim Chen <tim.c.chen@linux.intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Ingo Molnar <mingo@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Jiri Kosina <jkosina@suse.cz> Cc: Tom Lendacky <thomas.lendacky@amd.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: David Woodhouse <dwmw@amazon.co.uk> Cc: Andi Kleen <ak@linux.intel.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Casey Schaufler <casey.schaufler@intel.com> Cc: Asit Mallick <asit.k.mallick@intel.com> Cc: Arjan van de Ven <arjan@linux.intel.com> Cc: Jon Masters <jcm@redhat.com> Cc: Waiman Long <longman9394@gmail.com> Cc: Greg KH <gregkh@linuxfoundation.org> Cc: Dave Stewart <david.c.stewart@intel.com> Cc: Kees Cook <keescook@chromium.org> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20181125185005.082720373@linutronix.de