summaryrefslogtreecommitdiffstats
path: root/arch/x86
Commit message (Collapse)AuthorAgeFilesLines
...
| * x86/platform/intel-mid: Correct MSI IRQ line for watchdog deviceAndy Shevchenko2017-03-131-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The interrupt line used for the watchdog is 12, according to the official Intel Edison BSP code. And indeed after fixing it we start getting an interrupt and thus the watchdog starts working again: [ 191.699951] Kernel panic - not syncing: Kernel Watchdog Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Cc: Borislav Petkov <bp@alien8.de> Cc: David Cohen <david.a.cohen@linux.intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Fixes: 78a3bb9e408b ("x86: intel-mid: add watchdog platform code for Merrifield") Link: http://lkml.kernel.org/r/20170312150744.45493-1-andriy.shevchenko@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
* | Merge branch 'x86-acpi-for-linus' of ↵Linus Torvalds2017-03-172-21/+14
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 acpi fixes from Thomas Gleixner: "This update deals with the fallout of the recent work to make cpuid/node mappings persistent. It turned out that the boot time ACPI based mapping tripped over ACPI inconsistencies and caused regressions. It's partially reverted and the fragile part replaced by an implementation which makes the mapping persistent when a CPU goes online for the first time" * 'x86-acpi-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: acpi/processor: Check for duplicate processor ids at hotplug time acpi/processor: Implement DEVICE operator for processor enumeration x86/acpi: Restore the order of CPU IDs Revert"x86/acpi: Enable MADT APIs to return disabled apicids" Revert "x86/acpi: Set persistent cpuid <-> nodeid mapping when booting"
| * | x86/acpi: Restore the order of CPU IDsDou Liyang2017-03-112-20/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The following commits: f7c28833c2 ("x86/acpi: Enable acpi to register all possible cpus at boot time") and 8f54969dc8 ("x86/acpi: Introduce persistent storage for cpuid <-> apicid mapping") ... registered all the possible CPUs at boot time via ACPI tables to make the mapping of cpuid <-> apicid fixed. Both enabled and disabled CPUs could have a logical CPU ID after boot time. But, ACPI tables are unreliable. the number amd order of Local APIC entries which depends on the firmware is often inconsistent with the physical devices. Even if they are consistent, The disabled CPUs which take up some logical CPU IDs will also make the order discontinuous. Revert the part of disabled CPUs registration, keep the allocation logic of logical CPU IDs and also keep some code location changes. Signed-off-by: Dou Liyang <douly.fnst@cn.fujitsu.com> Tested-by: Xiaolong Ye <xiaolong.ye@intel.com> Cc: rjw@rjwysocki.net Cc: linux-acpi@vger.kernel.org Cc: guzheng1@huawei.com Cc: izumi.taku@jp.fujitsu.com Cc: lenb@kernel.org Link: http://lkml.kernel.org/r/1488528147-2279-4-git-send-email-douly.fnst@cn.fujitsu.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
| * | Revert "x86/acpi: Set persistent cpuid <-> nodeid mapping when booting"Dou Liyang2017-03-111-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Revert: dc6db24d2476 ("x86/acpi: Set persistent cpuid <-> nodeid mapping when booting") The mapping of "cpuid <-> nodeid" is established at boot time via ACPI tables to keep associations of workqueues and other node related items consistent across cpu hotplug. But, ACPI tables are unreliable and failures with that boot time mapping have been reported on machines where the ACPI table and the physical information which is retrieved at actual hotplug is inconsistent. Revert the mapping implementation so it can be replaced with a less error prone approach. Signed-off-by: Dou Liyang <douly.fnst@cn.fujitsu.com> Tested-by: Xiaolong Ye <xiaolong.ye@intel.com> Cc: rjw@rjwysocki.net Cc: linux-acpi@vger.kernel.org Cc: guzheng1@huawei.com Cc: izumi.taku@jp.fujitsu.com Cc: lenb@kernel.org Link: http://lkml.kernel.org/r/1488528147-2279-2-git-send-email-douly.fnst@cn.fujitsu.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
* | | Merge branch 'perf-urgent-for-linus' of ↵Linus Torvalds2017-03-171-2/+14
|\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf fixes from Thomas Gleixner: "A set of perf related fixes: - fix a CR4.PCE propagation issue caused by usage of mm instead of active_mm and therefore propagated the wrong value. - perf core fixes, which plug a use-after-free issue and make the event inheritance on fork more robust. - a tooling fix for symbol handling" * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf symbols: Fix symbols__fixup_end heuristic for corner cases x86/perf: Clarify why x86_pmu_event_mapped() isn't racy x86/perf: Fix CR4.PCE propagation to use active_mm instead of mm perf/core: Better explain the inherit magic perf/core: Simplify perf_event_free_task() perf/core: Fix event inheritance on fork() perf/core: Fix use-after-free in perf_release()
| * | | x86/perf: Clarify why x86_pmu_event_mapped() isn't racyAndy Lutomirski2017-03-171-0/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Naively, it looks racy, but ->mmap_sem saves it. Add a comment and a lockdep assertion. Signed-off-by: Andy Lutomirski <luto@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Borislav Petkov <bpetkov@suse.de> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Link: http://lkml.kernel.org/r/03a1e629063899168dfc4707f3bb6e581e21f5c6.1489694270.git.luto@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
| * | | x86/perf: Fix CR4.PCE propagation to use active_mm instead of mmAndy Lutomirski2017-03-171-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If one thread mmaps a perf event while another thread in the same mm is in some context where active_mm != mm (which can happen in the scheduler, for example), refresh_pce() would write the wrong value to CR4.PCE. This broke some PAPI tests. Reported-and-tested-by: Vince Weaver <vincent.weaver@maine.edu> Signed-off-by: Andy Lutomirski <luto@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Borislav Petkov <bpetkov@suse.de> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: stable@vger.kernel.org Fixes: 7911d3f7af14 ("perf/x86: Only allow rdpmc if a perf_event is mapped") Link: http://lkml.kernel.org/r/0c5b38a76ea50e405f9abe07a13dfaef87c173a1.1489694270.git.luto@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
* | | | mm, x86: fix native_pud_clear build errorArnd Bergmann2017-03-162-4/+1
| |_|/ |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We still get a build error in random configurations, after this has been modified a few times: In file included from include/linux/mm.h:68:0, from include/linux/suspend.h:8, from arch/x86/kernel/asm-offsets.c:12: arch/x86/include/asm/pgtable.h:66:26: error: redefinition of 'native_pud_clear' #define pud_clear(pud) native_pud_clear(pud) My interpretation is that the build error comes from a typo in __PAGETABLE_PUD_FOLDED, so fix that typo now, and remove the incorrect #ifdef around the native_pud_clear definition. Fixes: 3e761a42e19c ("mm, x86: fix HIGHMEM64 && PARAVIRT build config for native_pud_clear()") Fixes: a00cc7d9dd93 ("mm, x86: add support for PUD-sized transparent hugepages") Link: http://lkml.kernel.org/r/20170314121330.182155-1-arnd@arndb.de Signed-off-by: Arnd Bergmann <arnd@arndb.de> Ackedy-by: Dave Jiang <dave.jiang@intel.com> Cc: Matthew Wilcox <mawilcox@microsoft.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Thomas Garnier <thgarnie@google.com> Cc: Kees Cook <keescook@chromium.org> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Hugh Dickins <hughd@google.com> Cc: Borislav Petkov <bp@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | Merge branch 'x86-urgent-for-linus' of ↵Linus Torvalds2017-03-128-33/+46
|\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Thomas Gleixner: - a fix for the kexec/purgatory regression which was introduced in the merge window via an innocent sparse fix. We could have reverted that commit, but on deeper inspection it turned out that the whole machinery is neither documented nor robust. So a proper cleanup was done instead - the fix for the TLB flush issue which was discovered recently - a simple typo fix for a reboot quirk * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/tlb: Fix tlb flushing when lguest clears PGE kexec, x86/purgatory: Unbreak it and clean it up x86/reboot/quirks: Fix typo in ASUS EeeBook X205TA reboot quirk
| * | | x86/tlb: Fix tlb flushing when lguest clears PGEDaniel Borkmann2017-03-121-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Fengguang reported random corruptions from various locations on x86-32 after commits d2852a224050 ("arch: add ARCH_HAS_SET_MEMORY config") and 9d876e79df6a ("bpf: fix unlocking of jited image when module ronx not set") that uses the former. While x86-32 doesn't have a JIT like x86_64, the bpf_prog_lock_ro() and bpf_prog_unlock_ro() got enabled due to ARCH_HAS_SET_MEMORY, whereas Fengguang's test kernel doesn't have module support built in and therefore never had the DEBUG_SET_MODULE_RONX setting enabled. After investigating the crashes further, it turned out that using set_memory_ro() and set_memory_rw() didn't have the desired effect, for example, setting the pages as read-only on x86-32 would still let probe_kernel_write() succeed without error. This behavior would manifest itself in situations where the vmalloc'ed buffer was accessed prior to set_memory_*() such as in case of bpf_prog_alloc(). In cases where it wasn't, the page attribute changes seemed to have taken effect, leading to the conclusion that a TLB invalidate didn't happen. Moreover, it turned out that this issue reproduced with qemu in "-cpu kvm64" mode, but not for "-cpu host". When the issue occurs, change_page_attr_set_clr() did trigger a TLB flush as expected via __flush_tlb_all() through cpa_flush_range(), though. There are 3 variants for issuing a TLB flush: invpcid_flush_all() (depends on CPU feature bits X86_FEATURE_INVPCID, X86_FEATURE_PGE), cr4 based flush (depends on X86_FEATURE_PGE), and cr3 based flush. For "-cpu host" case in my setup, the flush used invpcid_flush_all() variant, whereas for "-cpu kvm64", the flush was cr4 based. Switching the kvm64 case to cr3 manually worked fine, and further investigating the cr4 one turned out that X86_CR4_PGE bit was not set in cr4 register, meaning the __native_flush_tlb_global_irq_disabled() wrote cr4 twice with the same value instead of clearing X86_CR4_PGE in the first write to trigger the flush. It turned out that X86_CR4_PGE was cleared from cr4 during init from lguest_arch_host_init() via adjust_pge(). The X86_FEATURE_PGE bit is also cleared from there due to concerns of using PGE in guest kernel that can lead to hard to trace bugs (see bff672e630a0 ("lguest: documentation V: Host") in init()). The CPU feature bits are cleared in dynamic boot_cpu_data, but they never propagated to __flush_tlb_all() as it uses static_cpu_has() instead of boot_cpu_has() for testing which variant of TLB flushing to use, meaning they still used the old setting of the host kernel. Clearing via setup_clear_cpu_cap(X86_FEATURE_PGE) so this would propagate to static_cpu_has() checks is too late at this point as sections have been patched already, so for now, it seems reasonable to switch back to boot_cpu_has(X86_FEATURE_PGE) as it was prior to commit c109bf95992b ("x86/cpufeature: Remove cpu_has_pge"). This lets the TLB flush trigger via cr3 as originally intended, properly makes the new page attributes visible and thus fixes the crashes seen by Fengguang. Fixes: c109bf95992b ("x86/cpufeature: Remove cpu_has_pge") Reported-by: Fengguang Wu <fengguang.wu@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Cc: bp@suse.de Cc: Kees Cook <keescook@chromium.org> Cc: "David S. Miller" <davem@davemloft.net> Cc: netdev@vger.kernel.org Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: lkp@01.org Cc: Laura Abbott <labbott@redhat.com> Cc: stable@vger.kernel.org Link: http://lkml.kernrl.org/r/20170301125426.l4nf65rx4wahohyl@wfg-t540p.sh.intel.com Link: http://lkml.kernel.org/r/25c41ad9eca164be4db9ad84f768965b7eb19d9e.1489191673.git.daniel@iogearbox.net Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
| * | | kexec, x86/purgatory: Unbreak it and clean it upThomas Gleixner2017-03-106-31/+44
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The purgatory code defines global variables which are referenced via a symbol lookup in the kexec code (core and arch). A recent commit addressing sparse warnings made these static and thereby broke kexec_file. Why did this happen? Simply because the whole machinery is undocumented and lacks any form of forward declarations. The variable names are unspecific and lack a prefix, so adding forward declarations creates shadow variables in the core code. Aside of that the code relies on magic constants and duplicate struct definitions with no way to ensure that these things stay in sync. The section placement of the purgatory variables happened by chance and not by design. Unbreak kexec and cleanup the mess: - Add proper forward declarations and document the usage - Use common struct definition - Use the proper common defines instead of magic constants - Add a purgatory_ prefix to have a proper name space - Use ARRAY_SIZE() instead of a homebrewn reimplementation - Add proper sections to the purgatory variables [ From Mike ] Fixes: 72042a8c7b01 ("x86/purgatory: Make functions and variables static") Reported-by: Mike Galbraith <<efault@gmx.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Nicholas Mc Guire <der.herr@hofr.at> Cc: Borislav Petkov <bp@alien8.de> Cc: Vivek Goyal <vgoyal@redhat.com> Cc: "Tobin C. Harding" <me@tobin.cc> Link: http://lkml.kernel.org/r/alpine.DEB.2.20.1703101315140.3681@nanos Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
| * | | x86/reboot/quirks: Fix typo in ASUS EeeBook X205TA reboot quirkMatjaz Hegedic2017-03-101-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The reboot quirk for ASUS EeeBook X205TA contains a typo in DMI_PRODUCT_NAME, improperly referring to X205TAW instead of X205TA, which prevents the quirk from being triggered. The model X205TAW already has a reboot quirk of its own. This fix simply removes the inappropriate final letter W. Fixes: 90b28ded88dd ("x86/reboot/quirks: Add ASUS EeeBook X205TA reboot quirk") Signed-off-by: Matjaz Hegedic <matjaz.hegedic@gmail.com> Link: http://lkml.kernel.org/r/1489064417-7445-1-git-send-email-matjaz.hegedic@gmail.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
* | | | Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds2017-03-111-22/+8
|\ \ \ \ | |_|_|/ |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull KVM fixes from Radim Krčmář: "ARM updates from Marc Zyngier: - vgic updates: - Honour disabling the ITS - Don't deadlock when deactivating own interrupts via MMIO - Correctly expose the lact of IRQ/FIQ bypass on GICv3 - I/O virtualization: - Make KVM_CAP_NR_MEMSLOTS big enough for large guests with many PCIe devices - General bug fixes: - Gracefully handle exception generated with syndroms that the host doesn't understand - Properly invalidate TLBs on VHE systems x86: - improvements in emulation of VMCLEAR, VMX MSR bitmaps, and VCPU reset * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: KVM: nVMX: do not warn when MSR bitmap address is not backed KVM: arm64: Increase number of user memslots to 512 KVM: arm/arm64: Remove KVM_PRIVATE_MEM_SLOTS definition that are unused KVM: arm/arm64: Enable KVM_CAP_NR_MEMSLOTS on arm/arm64 KVM: Add documentation for KVM_CAP_NR_MEMSLOTS KVM: arm/arm64: VGIC: Fix command handling while ITS being disabled arm64: KVM: Survive unknown traps from guests arm: KVM: Survive unknown traps from guests KVM: arm/arm64: Let vcpu thread modify its own active state KVM: nVMX: reset nested_run_pending if the vCPU is going to be reset kvm: nVMX: VMCLEAR should not cause the vCPU to shut down KVM: arm/arm64: vgic-v3: Don't pretend to support IRQ/FIQ bypass arm64: KVM: VHE: Clear HCR_TGE when invalidating guest TLBs
| * | | KVM: nVMX: do not warn when MSR bitmap address is not backedRadim Krčmář2017-03-091-3/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Before trying to do nested_get_page() in nested_vmx_merge_msr_bitmap(), we have already checked that the MSR bitmap address is valid (4k aligned and within physical limits). SDM doesn't specify what happens if the there is no memory mapped at the valid address, but Intel CPUs treat the situation as if the bitmap was configured to trap all MSRs. KVM already does that by returning false and a correct handling doesn't need the guest-trigerrable warning that was reported by syzkaller: (The warning was originally there to catch some possible bugs in nVMX.) ------------[ cut here ]------------ WARNING: CPU: 0 PID: 7832 at arch/x86/kvm/vmx.c:9709 nested_vmx_merge_msr_bitmap arch/x86/kvm/vmx.c:9709 [inline] WARNING: CPU: 0 PID: 7832 at arch/x86/kvm/vmx.c:9709 nested_get_vmcs12_pages+0xfb6/0x15c0 arch/x86/kvm/vmx.c:9640 Kernel panic - not syncing: panic_on_warn set ... CPU: 0 PID: 7832 Comm: syz-executor1 Not tainted 4.10.0+ #229 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:15 [inline] dump_stack+0x2ee/0x3ef lib/dump_stack.c:51 panic+0x1fb/0x412 kernel/panic.c:179 __warn+0x1c4/0x1e0 kernel/panic.c:540 warn_slowpath_null+0x2c/0x40 kernel/panic.c:583 nested_vmx_merge_msr_bitmap arch/x86/kvm/vmx.c:9709 [inline] nested_get_vmcs12_pages+0xfb6/0x15c0 arch/x86/kvm/vmx.c:9640 enter_vmx_non_root_mode arch/x86/kvm/vmx.c:10471 [inline] nested_vmx_run+0x6186/0xaab0 arch/x86/kvm/vmx.c:10561 handle_vmlaunch+0x1a/0x20 arch/x86/kvm/vmx.c:7312 vmx_handle_exit+0xfc0/0x3f00 arch/x86/kvm/vmx.c:8526 vcpu_enter_guest arch/x86/kvm/x86.c:6982 [inline] vcpu_run arch/x86/kvm/x86.c:7044 [inline] kvm_arch_vcpu_ioctl_run+0x1418/0x4840 arch/x86/kvm/x86.c:7205 kvm_vcpu_ioctl+0x673/0x1120 arch/x86/kvm/../../../virt/kvm/kvm_main.c:2570 Reported-by: Dmitry Vyukov <dvyukov@google.com> Reviewed-by: Jim Mattson <jmattson@google.com> [Jim Mattson explained the bare metal behavior: "I believe this behavior would be documented in the chipset data sheet rather than the SDM, since the chipset returns all 1s for an unclaimed read."] Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
| * | | KVM: nVMX: reset nested_run_pending if the vCPU is going to be resetWanpeng Li2017-03-071-1/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Reported by syzkaller: WARNING: CPU: 1 PID: 27742 at arch/x86/kvm/vmx.c:11029 nested_vmx_vmexit+0x5c35/0x74d0 arch/x86/kvm/vmx.c:11029 CPU: 1 PID: 27742 Comm: a.out Not tainted 4.10.0+ #229 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:15 [inline] dump_stack+0x2ee/0x3ef lib/dump_stack.c:51 panic+0x1fb/0x412 kernel/panic.c:179 __warn+0x1c4/0x1e0 kernel/panic.c:540 warn_slowpath_null+0x2c/0x40 kernel/panic.c:583 nested_vmx_vmexit+0x5c35/0x74d0 arch/x86/kvm/vmx.c:11029 vmx_leave_nested arch/x86/kvm/vmx.c:11136 [inline] vmx_set_msr+0x1565/0x1910 arch/x86/kvm/vmx.c:3324 kvm_set_msr+0xd4/0x170 arch/x86/kvm/x86.c:1099 do_set_msr+0x11e/0x190 arch/x86/kvm/x86.c:1128 __msr_io arch/x86/kvm/x86.c:2577 [inline] msr_io+0x24b/0x450 arch/x86/kvm/x86.c:2614 kvm_arch_vcpu_ioctl+0x35b/0x46a0 arch/x86/kvm/x86.c:3497 kvm_vcpu_ioctl+0x232/0x1120 arch/x86/kvm/../../../virt/kvm/kvm_main.c:2721 vfs_ioctl fs/ioctl.c:43 [inline] do_vfs_ioctl+0x1bf/0x1790 fs/ioctl.c:683 SYSC_ioctl fs/ioctl.c:698 [inline] SyS_ioctl+0x8f/0xc0 fs/ioctl.c:689 entry_SYSCALL_64_fastpath+0x1f/0xc2 The syzkaller folks reported a nested_run_pending warning during userspace clear VMX capability which is exposed to L1 before. The warning gets thrown while doing (*(uint32_t*)0x20aecfe8 = (uint32_t)0x1); (*(uint32_t*)0x20aecfec = (uint32_t)0x0); (*(uint32_t*)0x20aecff0 = (uint32_t)0x3a); (*(uint32_t*)0x20aecff4 = (uint32_t)0x0); (*(uint64_t*)0x20aecff8 = (uint64_t)0x0); r[29] = syscall(__NR_ioctl, r[4], 0x4008ae89ul, 0x20aecfe8ul, 0, 0, 0, 0, 0, 0); i.e. KVM_SET_MSR ioctl with struct kvm_msrs { .nmsrs = 1, .pad = 0, .entries = { {.index = MSR_IA32_FEATURE_CONTROL, .reserved = 0, .data = 0} } } The VMLANCH/VMRESUME emulation should be stopped since the CPU is going to reset here. This patch resets the nested_run_pending since the CPU is going to be reset hence there should be nothing pending. Reported-by: Dmitry Vyukov <dvyukov@google.com> Suggested-by: Radim Krčmář <rkrcmar@redhat.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Radim Krčmář <rkrcmar@redhat.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: David Hildenbrand <david@redhat.com> Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com> Reviewed-by: David Hildenbrand <david@redhat.com> Reviewed-by: Jim Mattson <jmattson@google.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
| * | | kvm: nVMX: VMCLEAR should not cause the vCPU to shut downJim Mattson2017-03-061-18/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | VMCLEAR should silently ignore a failure to clear the launch state of the VMCS referenced by the operand. Signed-off-by: Jim Mattson <jmattson@google.com> [Changed "kvm_write_guest(vcpu->kvm" to "kvm_vcpu_write_guest(vcpu".] Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
* | | | Merge branch 'prep-for-5level'Linus Torvalds2017-03-102-1/+6
|\ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Merge 5-level page table prep from Kirill Shutemov: "Here's relatively low-risk part of 5-level paging patchset. Merging it now will make x86 5-level paging enabling in v4.12 easier. The first patch is actually x86-specific: detect 5-level paging support. It boils down to single define. The rest of patchset converts Linux MMU abstraction from 4- to 5-level paging. Enabling of new abstraction in most cases requires adding single line of code in arch-specific code. The rest is taken care by asm-generic/. Changes to mm/ code are mostly mechanical: add support for new page table level -- p4d_t -- where we deal with pud_t now. v2: - fix build on microblaze (Michal); - comment for __ARCH_HAS_5LEVEL_HACK in kasan_populate_zero_shadow(); - acks from Michal" * emailed patches from Kirill A Shutemov <kirill.shutemov@linux.intel.com>: mm: introduce __p4d_alloc() mm: convert generic code to 5-level paging asm-generic: introduce <asm-generic/pgtable-nop4d.h> arch, mm: convert all architectures to use 5level-fixup.h asm-generic: introduce __ARCH_USE_5LEVEL_HACK asm-generic: introduce 5level-fixup.h x86/cpufeature: Add 5-level paging detection
| * | | | arch, mm: convert all architectures to use 5level-fixup.hKirill A. Shutemov2017-03-091-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If an architecture uses 4level-fixup.h we don't need to do anything as it includes 5level-fixup.h. If an architecture uses pgtable-nop*d.h, define __ARCH_USE_5LEVEL_HACK before inclusion of the header. It makes asm-generic code to use 5level-fixup.h. If an architecture has 4-level paging or folds levels on its own, include 5level-fixup.h directly. Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Acked-by: Michal Hocko <mhocko@suse.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
| * | | | x86/cpufeature: Add 5-level paging detectionKirill A. Shutemov2017-03-091-1/+2
| | |/ / | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Look for 'la57' in /proc/cpuinfo to see if your machine supports 5-level paging. Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Acked-by: Michal Hocko <mhocko@suse.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | | Merge branch 'akpm' (patches from Andrew)Linus Torvalds2017-03-102-17/+22
|\ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Merge fixes from Andrew Morton: "26 fixes" * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (26 commits) userfaultfd: remove wrong comment from userfaultfd_ctx_get() fat: fix using uninitialized fields of fat_inode/fsinfo_inode sh: cayman: IDE support fix kasan: fix races in quarantine_remove_cache() kasan: resched in quarantine_remove_cache() mm: do not call mem_cgroup_free() from within mem_cgroup_alloc() thp: fix another corner case of munlock() vs. THPs rmap: fix NULL-pointer dereference on THP munlocking mm/memblock.c: fix memblock_next_valid_pfn() userfaultfd: selftest: vm: allow to build in vm/ directory userfaultfd: non-cooperative: userfaultfd_remove revalidate vma in MADV_DONTNEED userfaultfd: non-cooperative: fix fork fctx->new memleak mm/cgroup: avoid panic when init with low memory drivers/md/bcache/util.h: remove duplicate inclusion of blkdev.h mm/vmstats: add thp_split_pud event for clarity include/linux/fs.h: fix unsigned enum warning with gcc-4.2 userfaultfd: non-cooperative: release all ctx in dup_userfaultfd_complete userfaultfd: non-cooperative: robustness check userfaultfd: non-cooperative: rollback userfaultfd_exit x86, mm: unify exit paths in gup_pte_range() ...
| * | | | x86, mm: unify exit paths in gup_pte_range()Dan Williams2017-03-091-19/+20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | All exit paths from gup_pte_range() require pte_unmap() of the original pte page before returning. Refactor the code to have a single exit point to do the unmap. This mirrors the flow of the generic gup_pte_range() in mm/gup.c. Link: http://lkml.kernel.org/r/148804251828.36605.14910389618497006945.stgit@dwillia2-desk3.amr.corp.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Ross Zwisler <ross.zwisler@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
| * | | | x86, mm: fix gup_pte_range() vs DAX mappingsDan Williams2017-03-091-2/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | gup_pte_range() fails to check pte_allows_gup() before translating a DAX pte entry, pte_devmap(), to a page. This allows writes to read-only mappings, and bypasses the DAX cacheline dirty tracking due to missed 'mkwrite' faults. The gup_huge_pmd() path and the gup_huge_pud() path correctly check pte_allows_gup() before checking for _devmap() entries. Fixes: 3565fce3a659 ("mm, x86: get_user_pages() for dax mappings") Link: http://lkml.kernel.org/r/148804251312.36605.12665024794196605053.stgit@dwillia2-desk3.amr.corp.intel.com Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com> Reported-by: Dave Hansen <dave.hansen@linux.intel.com> Reported-by: Ross Zwisler <ross.zwisler@linux.intel.com> Cc: Xiong Zhou <xzhou@redhat.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
| * | | | scripts/spelling.txt: add "disble(d)" pattern and fix typo instancesMasahiro Yamada2017-03-091-1/+1
| |/ / / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Fix typos and add the following to the scripts/spelling.txt: disble||disable disbled||disabled I kept the TSL2563_INT_DISBLED in /drivers/iio/light/tsl2563.c untouched. The macro is not referenced at all, but this commit is touching only comment blocks just in case. Link: http://lkml.kernel.org/r/1481573103-11329-20-git-send-email-yamada.masahiro@socionext.com Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | | Merge tag 'for-linus-4.11-rc1-tag' of ↵Linus Torvalds2017-03-091-16/+7
|\ \ \ \ | |/ / / |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip Pull xen fix and cleanup from Juergen Gross: "This contains one fix for MSIX handling under Xen and a trivial cleanup patch" * tag 'for-linus-4.11-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip: xenbus: Remove duplicate inclusion of linux/init.h xen: do not re-use pirq number cached in pci device msi msg data
| * | | xen: do not re-use pirq number cached in pci device msi msg dataDan Streetman2017-02-221-16/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Revert the main part of commit: af42b8d12f8a ("xen: fix MSI setup and teardown for PV on HVM guests") That commit introduced reading the pci device's msi message data to see if a pirq was previously configured for the device's msi/msix, and re-use that pirq. At the time, that was the correct behavior. However, a later change to Qemu caused it to call into the Xen hypervisor to unmap all pirqs for a pci device, when the pci device disables its MSI/MSIX vectors; specifically the Qemu commit: c976437c7dba9c7444fb41df45468968aaa326ad ("qemu-xen: free all the pirqs for msi/msix when driver unload") Once Qemu added this pirq unmapping, it was no longer correct for the kernel to re-use the pirq number cached in the pci device msi message data. All Qemu releases since 2.1.0 contain the patch that unmaps the pirqs when the pci device disables its MSI/MSIX vectors. This bug is causing failures to initialize multiple NVMe controllers under Xen, because the NVMe driver sets up a single MSIX vector for each controller (concurrently), and then after using that to talk to the controller for some configuration data, it disables the single MSIX vector and re-configures all the MSIX vectors it needs. So the MSIX setup code tries to re-use the cached pirq from the first vector for each controller, but the hypervisor has already given away that pirq to another controller, and its initialization fails. This is discussed in more detail at: https://lists.xen.org/archives/html/xen-devel/2017-01/msg00447.html Fixes: af42b8d12f8a ("xen: fix MSI setup and teardown for PV on HVM guests") Signed-off-by: Dan Streetman <dan.streetman@canonical.com> Reviewed-by: Stefano Stabellini <sstabellini@kernel.org> Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
* | | | Merge branch 'x86-urgent-for-linus' of ↵Linus Torvalds2017-03-0719-40/+67
|\ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Ingo Molnar: "Misc fixes and minor updates all over the place: - an SGI/UV fix - a defconfig update - a build warning fix - move the boot_params file to the arch location in debugfs - a pkeys fix - selftests fix - boot message fixes - sparse fixes - a resume warning fix - ioapic hotplug fixes - reboot quirks ... plus various minor cleanups" * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/build/x86_64_defconfig: Enable CONFIG_R8169 x86/reboot/quirks: Add ASUS EeeBook X205TA/W reboot quirk x86/hpet: Prevent might sleep splat on resume x86/boot: Correct setup_header.start_sys name x86/purgatory: Fix sparse warning, symbol not declared x86/purgatory: Make functions and variables static x86/events: Remove last remnants of old filenames x86/pkeys: Check against max pkey to avoid overflows x86/ioapic: Split IOAPIC hot-removal into two steps x86/PCI: Implement pcibios_release_device to release IRQ from IOAPIC x86/intel_rdt: Remove duplicate inclusion of linux/cpu.h x86/vmware: Remove duplicate inclusion of asm/timer.h x86/hyperv: Hide unused label x86/reboot/quirks: Add ASUS EeeBook X205TA reboot quirk x86/platform/uv/BAU: Fix HUB errors by remove initial write to sw-ack register x86/selftests: Add clobbers for int80 on x86_64 x86/apic: Simplify enable_IR_x2apic(), remove try_to_enable_IR() x86/apic: Fix a warning message in logical CPU IDs allocation x86/kdebugfs: Move boot params hierarchy under (debugfs)/x86/
| * | | | x86/build/x86_64_defconfig: Enable CONFIG_R8169Andy Shevchenko2017-03-061-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Very common PCIe ethernet card. Already enabled in i386_defconfig. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Konstantin Khlebnikov <koct9i@gmail.com> Link: http://lkml.kernel.org/r/20170306085748.85957-1-andriy.shevchenko@linux.intel.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
| * | | | x86/reboot/quirks: Add ASUS EeeBook X205TA/W reboot quirkMatjaz Hegedic2017-03-061-0/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Without the parameter reboot=a, ASUS EeeBook X205TA/W will hang when it should reboot. This adds the appropriate quirk, thus fixing the problem. Signed-off-by: Matjaz Hegedic <matjaz.hegedic@gmail.com> Link: http://lkml.kernel.org/r/1488737804-20681-1-git-send-email-matjaz.hegedic@gmail.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
| * | | | x86/hpet: Prevent might sleep splat on resumeThomas Gleixner2017-03-021-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Sergey reported a might sleep warning triggered from the hpet resume path. It's caused by the call to disable_irq() from interrupt disabled context. The problem with the low level resume code is that it is not accounted as a special system_state like we do during the boot process. Calling the same code during system boot would not trigger the warning. That's inconsistent at best. In this particular case it's trivial to replace the disable_irq() with disable_hardirq() because this particular code path is solely used from system resume and the involved hpet interrupts can never be force threaded. Reported-and-tested-by: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: "Rafael J. Wysocki" <rjw@sisk.pl> Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Cc: Borislav Petkov <bp@alien8.de> Link: http://lkml.kernel.org/r/alpine.DEB.2.20.1703012108460.3684@nanos Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
| * | | | x86/boot: Correct setup_header.start_sys nameBorislav Petkov2017-03-011-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | It is called start_sys_seg elsewhere so rename it to that. It is an obsolete field so we could just as well directly call it __u16 __pad... No functional change. Signed-off-by: Borislav Petkov <bp@suse.de> Link: http://lkml.kernel.org/r/20170221183639.16554-1-bp@alien8.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
| * | | | x86/purgatory: Fix sparse warning, symbol not declaredTobin C. Harding2017-03-013-0/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Sparse emits warning, 'symbol not declared' for a function that has neither file scope nor a forward declaration. The functions only call site is an ASM file. Add a header file with the function declaration. Include the header file in the C source file defining the function in order to fix the sparse warning. Include the header file in ASM file containing the call site to document the usage. Signed-off-by: Tobin C. Harding <me@tobin.cc> Link: http://lkml.kernel.org/r/1487545956-2547-3-git-send-email-me@tobin.cc Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
| * | | | x86/purgatory: Make functions and variables staticTobin C. Harding2017-03-011-5/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Sparse emits several 'symbol not declared' warnings for various functions and variables. Add static keyword to functions and variables which have file scope only. Signed-off-by: Tobin C. Harding <me@tobin.cc> Link: http://lkml.kernel.org/r/1487545956-2547-2-git-send-email-me@tobin.cc Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
| * | | | x86/events: Remove last remnants of old filenamesBorislav Petkov2017-03-014-6/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Update to the new file paths, remove them from introductory comments. Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20170218113140.8051-1-bp@alien8.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
| * | | | x86/pkeys: Check against max pkey to avoid overflowsDave Hansen2017-03-011-6/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Kirill reported a warning from UBSAN about undefined behavior when using protection keys. He is running on hardware that actually has support for it, which is not widely available. The warning triggers because of very large shifts of integers when doing a pkey_free() of a large, invalid value. This happens because we never check that the pkey "fits" into the mm_pkey_allocation_map(). I do not believe there is any danger here of anything bad happening other than some aliasing issues where somebody could do: pkey_free(35); and the kernel would effectively execute: pkey_free(8); While this might be confusing to an app that was doing something stupid, it has to do something stupid and the effects are limited to the app shooting itself in the foot. Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Cc: stable@vger.kernel.org Cc: linux-kselftest@vger.kernel.org Cc: shuah@kernel.org Cc: kirill.shutemov@linux.intel.com Link: http://lkml.kernel.org/r/20170223222603.A022ED65@viggo.jf.intel.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
| * | | | x86/PCI: Implement pcibios_release_device to release IRQ from IOAPICRui Wang2017-03-011-0/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The revert of 991de2e59090 ("PCI, x86: Implement pcibios_alloc_irq() and pcibios_free_irq()") causes a problem for IOAPIC hotplug. The problem is that IRQs are allocated and freed in pci_enable_device() and pci_disable_device(). But there are some drivers which don't call pci_disable_device(), and they have good reasons not calling it, so if they're using IOAPIC their IRQs won't have a chance to be released from the IOAPIC. When this happens IOAPIC hot-removal fails with a kernel stack dump and an error message like this: [149335.697989] pin16 on IOAPIC2 is still in use. It turns out that we can fix it in a different way without moving IRQ allocation into pcibios_alloc_irq(), thus avoiding the regression of 991de2e59090. We can keep the allocation and freeing of IRQs as is within pci_enable_device()/pci_disable_device(), without breaking any previous assumption of the rest of the system, keeping compatibility with both the legacy and the modern drivers. We can accomplish this by implementing the existing __weak hook of pcibios_release_device() thus when a pci device is about to be deleted we get notified in the hook and take the chance to release its IRQ, if any, from the IOAPIC. Implement pcibios_release_device() for x86 to release any IRQ not released by the driver. Signed-off-by: Rui Wang <rui.y.wang@intel.com> Cc: tony.luck@intel.com Cc: linux-pci@vger.kernel.org Cc: rjw@rjwysocki.net Cc: linux-acpi@vger.kernel.org Cc: fengguang.wu@intel.com Cc: helgaas@kernel.org Cc: kbuild-all@01.org Cc: bhelgaas@google.com Link: http://lkml.kernel.org/r/1488288869-31290-2-git-send-email-rui.y.wang@intel.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
| * | | | x86/intel_rdt: Remove duplicate inclusion of linux/cpu.hMasanari Iida2017-03-011-1/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Signed-off-by: Masanari Iida <standby24x7@gmail.com> Cc: fenghua.yu@intel.com Link: http://lkml.kernel.org/r/20170227130703.26968-1-standby24x7@gmail.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
| * | | | x86/vmware: Remove duplicate inclusion of asm/timer.hMasanari Iida2017-03-011-1/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Signed-off-by: Masanari Iida <standby24x7@gmail.com> Cc: akataria@vmware.com Cc: virtualization@lists.linux-foundation.org Link: http://lkml.kernel.org/r/20170227122922.26230-1-standby24x7@gmail.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
| * | | | x86/hyperv: Hide unused labelArnd Bergmann2017-03-011-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This new 32-bit warning just showed up: arch/x86/hyperv/hv_init.c: In function 'hyperv_init': arch/x86/hyperv/hv_init.c:167:1: error: label 'register_msr_cs' defined but not used [-Werror=unused-label] The easiest solution is to move the label up into the existing #ifdef that has the goto. Fixes: dee863b571b0 ("hv: export current Hyper-V clocksource") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Stephen Hemminger <sthemmin@microsoft.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Haiyang Zhang <haiyangz@microsoft.com> Cc: devel@linuxdriverproject.org Cc: Vitaly Kuznetsov <vkuznets@redhat.com> Cc: "K. Y. Srinivasan" <kys@microsoft.com> Link: http://lkml.kernel.org/r/20170214211736.2641241-1-arnd@arndb.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
| * | | | x86/reboot/quirks: Add ASUS EeeBook X205TA reboot quirkMatjaz Hegedic2017-03-011-0/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Without the parameter reboot=a, ASUS EeeBook X205TA will hang when it should reboot. This adds the appropriate quirk, thus fixing the problem. Signed-off-by: Matjaz Hegedic <matjaz.hegedic@gmail.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org>
| * | | | x86/platform/uv/BAU: Fix HUB errors by remove initial write to sw-ack registerAndrew Banman2017-03-011-1/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Writing to the software acknowledge clear register when there are no pending messages causes a HUB error to assert. The original intent of this write was to clear the pending bits before start of operation, but this is an incorrect method and has been determined to be unnecessary. Signed-off-by: Andrew Banman <abanman@hpe.com> Acked-by: Mike Travis <mike.travis@hpe.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: akpm@linux-foundation.org Cc: rja@hpe.com Cc: sivanich@hpe.com Link: http://lkml.kernel.org/r/1487351269-181133-1-git-send-email-abanman@hpe.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
| * | | | x86/apic: Simplify enable_IR_x2apic(), remove try_to_enable_IR()Dou Liyang2017-03-011-13/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The following commit: 2e63ad4bd5dd ("x86/apic: Do not init irq remapping if ioapic is disabled") ... added a check for skipped IO-APIC setup to enable_IR_x2apic(), but this check is also duplicated in try_to_enable_IR() - and it will never succeed in calling irq_remapping_enable(). Remove the whole irq_remapping_enable() complication: if the IO-APIC is disabled we cannot enable IRQ remapping. Signed-off-by: Dou Liyang <douly.fnst@cn.fujitsu.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: bp@alien8.de Cc: nicstange@gmail.com Cc: wanpeng.li@hotmail.com Link: http://lkml.kernel.org/r/1487841401-1543-1-git-send-email-douly.fnst@cn.fujitsu.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
| * | | | x86/apic: Fix a warning message in logical CPU IDs allocationDou Liyang2017-03-011-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The current warning message in allocate_logical_cpuid() is somewhat confusing: Only 1 processors supported.Processor 2/0x2 and the rest are ignored. As it might imply that there's only one CPU in the system - while what we ran into here is a kernel limitation. Fix the warning message to clarify all that: APIC: NR_CPUS/possible_cpus limit of 2 reached. Processor 2/0x2 and the rest are ignored. ( Also update the error return from -1 to -EINVAL, which is the more canonical return value. ) Signed-off-by: Dou Liyang <douly.fnst@cn.fujitsu.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: bp@alien8.de Cc: nicstange@gmail.com Cc: wanpeng.li@hotmail.com Link: http://lkml.kernel.org/r/1488261052-25753-1-git-send-email-douly.fnst@cn.fujitsu.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
| * | | | x86/kdebugfs: Move boot params hierarchy under (debugfs)/x86/Borislav Petkov2017-03-011-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | ... since this is all x86-specific data and it makes sense to have it under x86/ logically instead in the toplevel debugfs dir. Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20170227225058.27289-1-bp@alien8.de Signed-off-by: Ingo Molnar <mingo@kernel.org>
* | | | | Merge branch 'sched-urgent-for-linus' of ↵Linus Torvalds2017-03-077-28/+23
|\ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler fixes from Ingo Molnar: "A fix for KVM's scheduler clock which (erroneously) was always marked unstable, a fix for RT/DL load balancing, plus latency fixes" * 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: sched/clock, x86/tsc: Rework the x86 'unstable' sched_clock() interface sched/core: Fix pick_next_task() for RT,DL sched/fair: Make select_idle_cpu() more aggressive
| * | | | | sched/clock, x86/tsc: Rework the x86 'unstable' sched_clock() interfacePeter Zijlstra2017-03-027-28/+23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Wanpeng Li reported that since the following commit: acb04058de49 ("sched/clock: Fix hotplug crash") ... KVM always runs with unstable sched-clock even though KVM's kvm_clock _is_ stable. The problem is that we've tied clear_sched_clock_stable() to the TSC state, and overlooked that sched_clock() is a paravirt function. Solve this by doing two things: - tie the sched_clock() stable state more clearly to the TSC stable state for the normal (!paravirt) case. - only call clear_sched_clock_stable() when we mark TSC unstable when we use native_sched_clock(). The first means we can actually run with stable sched_clock in more situations then before, which is good. And since commit: 12907fbb1a69 ("sched/clock, clocksource: Add optional cs::mark_unstable() method") ... this should be reliable. Since any detection of TSC fail now results in marking the TSC unstable. Reported-by: Wanpeng Li <kernellwp@gmail.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Fixes: acb04058de49 ("sched/clock: Fix hotplug crash") Signed-off-by: Ingo Molnar <mingo@kernel.org>
* | | | | | Merge branch 'perf-urgent-for-linus' of ↵Linus Torvalds2017-03-073-5/+5
|\ \ \ \ \ \ | |_|_|_|/ / |/| | | | / | | |_|_|/ | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf fixes from Ingo Molnar: "This includes a fix for a crash if certain special addresses are kprobed, plus does a rename of two Kconfig variables that were a minor misnomer" * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf/core: Rename CONFIG_[UK]PROBE_EVENT to CONFIG_[UK]PROBE_EVENTS kprobes/x86: Fix kernel panic when certain exception-handling addresses are probed
| * | | | Merge branch 'linus' into perf/urgent, to resolve conflictIngo Molnar2017-03-021-2/+0
| |\| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Conflicts: arch/powerpc/configs/85xx/kmp204x_defconfig Signed-off-by: Ingo Molnar <mingo@kernel.org>
| * | | | kprobes/x86: Fix kernel panic when certain exception-handling addresses are ↵Masami Hiramatsu2017-03-013-5/+5
| | |/ / | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | probed Fix to the exception table entry check by using probed address instead of the address of copied instruction. This bug may cause unexpected kernel panic if user probe an address where an exception can happen which should be fixup by __ex_table (e.g. copy_from_user.) Unless user puts a kprobe on such address, this doesn't cause any problem. This bug has been introduced years ago, by commit: 464846888d9a ("x86/kprobes: Fix a bug which can modify kernel code permanently"). Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Fixes: 464846888d9a ("x86/kprobes: Fix a bug which can modify kernel code permanently") Link: http://lkml.kernel.org/r/148829899399.28855.12581062400757221722.stgit@devbox Signed-off-by: Ingo Molnar <mingo@kernel.org>
* | | | Merge tag 'kvm-4.11-2' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds2017-03-045-17/+37
|\ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull more KVM updates from Radim Krčmář: "Second batch of KVM changes for the 4.11 merge window: PPC: - correct assumption about ASDR on POWER9 - fix MMIO emulation on POWER9 x86: - add a simple test for ioperm - cleanup TSS (going through KVM tree as the whole undertaking was caused by VMX's use of TSS) - fix nVMX interrupt delivery - fix some performance counters in the guest ... and two cleanup patches" * tag 'kvm-4.11-2' of git://git.kernel.org/pub/scm/virt/kvm/kvm: KVM: nVMX: Fix pending events injection x86/kvm/vmx: remove unused variable in segment_base() selftests/x86: Add a basic selftest for ioperm x86/asm: Tidy up TSS limit code kvm: convert kvm.users_count from atomic_t to refcount_t KVM: x86: never specify a sample period for virtualized in_tx_cp counters KVM: PPC: Book3S HV: Don't use ASDR for real-mode HPT faults on POWER9 KVM: PPC: Book3S HV: Fix software walk of guest process page tables
| * | | | KVM: nVMX: Fix pending events injectionWanpeng Li2017-03-011-2/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | L2 fails to boot on a non-APICv box dues to 'commit 0ad3bed6c5ec ("kvm: nVMX: move nested events check to kvm_vcpu_running")' KVM internal error. Suberror: 3 extra data[0]: 800000ef extra data[1]: 1 RAX=0000000000000000 RBX=ffffffff81f36140 RCX=0000000000000000 RDX=0000000000000000 RSI=0000000000000000 RDI=0000000000000000 RBP=ffff88007c92fe90 RSP=ffff88007c92fe90 R8 =ffff88007fccdca0 R9 =0000000000000000 R10=00000000fffedb3d R11=0000000000000000 R12=0000000000000003 R13=0000000000000000 R14=0000000000000000 R15=ffff88007c92c000 RIP=ffffffff810645e6 RFL=00000246 [---Z-P-] CPL=0 II=0 A20=1 SMM=0 HLT=0 ES =0000 0000000000000000 ffffffff 00c00000 CS =0010 0000000000000000 ffffffff 00a09b00 DPL=0 CS64 [-RA] SS =0000 0000000000000000 ffffffff 00c00000 DS =0000 0000000000000000 ffffffff 00c00000 FS =0000 0000000000000000 ffffffff 00c00000 GS =0000 ffff88007fcc0000 ffffffff 00c00000 LDT=0000 0000000000000000 ffffffff 00c00000 TR =0040 ffff88007fcd4200 00002087 00008b00 DPL=0 TSS64-busy GDT= ffff88007fcc9000 0000007f IDT= ffffffffff578000 00000fff CR0=80050033 CR2=00000000ffffffff CR3=0000000001e0a000 CR4=003406e0 DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 DR3=0000000000000000 DR6=00000000fffe0ff0 DR7=0000000000000400 EFER=0000000000000d01 We should try to reinject previous events if any before trying to inject new event if pending. If vmexit is triggered by L2 guest and L0 interested in, we should reinject IDT-vectoring info to L2 through vmcs02 if any, otherwise, we can consider new IRQs/NMIs which can be injected and call nested events callback to switch from L2 to L1 if needed and inject the proper vmexit events. However, 'commit 0ad3bed6c5ec ("kvm: nVMX: move nested events check to kvm_vcpu_running")' results in the handle events order reversely on non-APICv box. This patch fixes it by bailing out for pending events and not consider new events in this scenario. Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Radim Krčmář <rkrcmar@redhat.com> Cc: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com> Fixes: 0ad3bed6c5ec ("kvm: nVMX: move nested events check to kvm_vcpu_running") Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>