summaryrefslogtreecommitdiffstats
path: root/arch/x86/kvm/vmx.c
Commit message (Collapse)AuthorAgeFilesLines
...
* | | x86/kvm: Rename VMX's segment access rights definesAndy Lutomirski2015-08-151-7/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | VMX encodes access rights differently from LAR, and the latter is most likely what x86 people think of when they think of "access rights". Rename them to avoid confusion. Cc: kvm@vger.kernel.org Signed-off-by: Andy Lutomirski <luto@kernel.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* | | KVM: VMX: drop ept misconfig checkXiao Guangrong2015-08-051-72/+2
| | | | | | | | | | | | | | | | | | | | | | | | The logic used to check ept misconfig is completely contained in common reserved bits check for sptes, so it can be removed Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* | | kvm/x86: add support for MONITOR_TRAP_FLAGMihai Donțu2015-07-231-4/+12
| | | | | | | | | | | | | | | | | | | | | | | | Allow a nested hypervisor to single step its guests. Signed-off-by: Mihai Donțu <mihai.dontu@gmail.com> [Fix overlong line. - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* | | KVM: nVMX: VMX instructions: add checks for #GP/#SS exceptionsEugene Korenevsky2015-07-231-16/+61
| |/ |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | According to Intel SDM several checks must be applied for memory operands of VMX instructions. Long mode: #GP(0) or #SS(0) depending on the segment must be thrown if the memory address is in a non-canonical form. Protected mode, checks in chronological order: - The segment type must be checked with access type (read or write) taken into account. For write access: #GP(0) must be generated if the destination operand is located in a read-only data segment or any code segment. For read access: #GP(0) must be generated if if the source operand is located in an execute-only code segment. - Usability of the segment must be checked. #GP(0) or #SS(0) depending on the segment must be thrown if the segment is unusable. - Limit check. #GP(0) or #SS(0) depending on the segment must be thrown if the memory operand effective address is outside the segment limit. Signed-off-by: Eugene Korenevsky <ekorenevsky@gmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* | KVM: x86: rename quirk constants to KVM_X86_QUIRK_*Paolo Bonzini2015-07-231-1/+1
| | | | | | | | | | | | | | Make them clearly architecture-dependent; the capability is valid for all architectures, but the argument is not. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* | KVM: vmx: obey KVM_QUIRK_CD_NW_CLEAREDXiao Guangrong2015-07-231-1/+4
| | | | | | | | | | | | | | | | | | | | | | | | OVMF depends on WB to boot fast, because it only clears caches after it has set up MTRRs---which is too late. Let's do writeback if CR0.CD is set to make it happy, similar to what SVM is already doing. Signed-off-by: Xiao Guangrong <guangrong.xiao@intel.com> Tested-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* | KVM: x86: apply guest MTRR virtualization on host reserved pagesPaolo Bonzini2015-07-101-8/+3
|/ | | | | | | | | | | | | Currently guest MTRR is avoided if kvm_is_reserved_pfn returns true. However, the guest could prefer a different page type than UC for such pages. A good example is that pass-throughed VGA frame buffer is not always UC as host expected. This patch enables full use of virtual guest MTRRs. Suggested-by: Xiao Guangrong <guangrong.xiao@linux.intel.com> Tested-by: Joerg Roedel <jroedel@suse.de> (on AMD) Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* Merge tag 'trace-v4.2' of ↵Linus Torvalds2015-06-261-1/+1
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull tracing updates from Steven Rostedt: "This patch series contains several clean ups and even a new trace clock "monitonic raw". Also some enhancements to make the ring buffer even faster. But the biggest and most noticeable change is the renaming of the ftrace* files, structures and variables that have to deal with trace events. Over the years I've had several developers tell me about their confusion with what ftrace is compared to events. Technically, "ftrace" is the infrastructure to do the function hooks, which include tracing and also helps with live kernel patching. But the trace events are a separate entity altogether, and the files that affect the trace events should not be named "ftrace". These include: include/trace/ftrace.h -> include/trace/trace_events.h include/linux/ftrace_event.h -> include/linux/trace_events.h Also, functions that are specific for trace events have also been renamed: ftrace_print_*() -> trace_print_*() (un)register_ftrace_event() -> (un)register_trace_event() ftrace_event_name() -> trace_event_name() ftrace_trigger_soft_disabled() -> trace_trigger_soft_disabled() ftrace_define_fields_##call() -> trace_define_fields_##call() ftrace_get_offsets_##call() -> trace_get_offsets_##call() Structures have been renamed: ftrace_event_file -> trace_event_file ftrace_event_{call,class} -> trace_event_{call,class} ftrace_event_buffer -> trace_event_buffer ftrace_subsystem_dir -> trace_subsystem_dir ftrace_event_raw_##call -> trace_event_raw_##call ftrace_event_data_offset_##call-> trace_event_data_offset_##call ftrace_event_type_funcs_##call -> trace_event_type_funcs_##call And a few various variables and flags have also been updated. This has been sitting in linux-next for some time, and I have not heard a single complaint about this rename breaking anything. Mostly because these functions, variables and structures are mostly internal to the tracing system and are seldom (if ever) used by anything external to that" * tag 'trace-v4.2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: (33 commits) ring_buffer: Allow to exit the ring buffer benchmark immediately ring-buffer-benchmark: Fix the wrong type ring-buffer-benchmark: Fix the wrong param in module_param ring-buffer: Add enum names for the context levels ring-buffer: Remove useless unused tracing_off_permanent() ring-buffer: Give NMIs a chance to lock the reader_lock ring-buffer: Add trace_recursive checks to ring_buffer_write() ring-buffer: Allways do the trace_recursive checks ring-buffer: Move recursive check to per_cpu descriptor ring-buffer: Add unlikelys to make fast path the default tracing: Rename ftrace_get_offsets_##call() to trace_event_get_offsets_##call() tracing: Rename ftrace_define_fields_##call() to trace_event_define_fields_##call() tracing: Rename ftrace_event_type_funcs_##call to trace_event_type_funcs_##call tracing: Rename ftrace_data_offset_##call to trace_event_data_offset_##call tracing: Rename ftrace_raw_##call event structures to trace_event_raw_##call tracing: Rename ftrace_trigger_soft_disabled() to trace_trigger_soft_disabled() tracing: Rename FTRACE_EVENT_FL_* flags to EVENT_FILE_FL_* tracing: Rename struct ftrace_subsystem_dir to trace_subsystem_dir tracing: Rename ftrace_event_name() to trace_event_name() tracing: Rename FTRACE_MAX_EVENT to TRACE_EVENT_TYPE_MAX ...
| * tracing: Rename ftrace_event.h to trace_events.hSteven Rostedt (Red Hat)2015-05-131-1/+1
| | | | | | | | | | | | | | | | | | The term "ftrace" is really the infrastructure of the function hooks, and not the trace events. Rename ftrace_event.h to trace_events.h to represent the trace_event infrastructure and decouple the term ftrace from it. Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
* | Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds2015-06-241-98/+265
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull first batch of KVM updates from Paolo Bonzini: "The bulk of the changes here is for x86. And for once it's not for silicon that no one owns: these are really new features for everyone. Details: - ARM: several features are in progress but missed the 4.2 deadline. So here is just a smattering of bug fixes, plus enabling the VFIO integration. - s390: Some fixes/refactorings/optimizations, plus support for 2GB pages. - x86: * host and guest support for marking kvmclock as a stable scheduler clock. * support for write combining. * support for system management mode, needed for secure boot in guests. * a bunch of cleanups required for the above * support for virtualized performance counters on AMD * legacy PCI device assignment is deprecated and defaults to "n" in Kconfig; VFIO replaces it On top of this there are also bug fixes and eager FPU context loading for FPU-heavy guests. - Common code: Support for multiple address spaces; for now it is used only for x86 SMM but the s390 folks also have plans" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (124 commits) KVM: s390: clear floating interrupt bitmap and parameters KVM: x86/vPMU: Enable PMU handling for AMD PERFCTRn and EVNTSELn MSRs KVM: x86/vPMU: Implement AMD vPMU code for KVM KVM: x86/vPMU: Define kvm_pmu_ops to support vPMU function dispatch KVM: x86/vPMU: introduce kvm_pmu_msr_idx_to_pmc KVM: x86/vPMU: reorder PMU functions KVM: x86/vPMU: whitespace and stylistic adjustments in PMU code KVM: x86/vPMU: use the new macros to go between PMC, PMU and VCPU KVM: x86/vPMU: introduce pmu.h header KVM: x86/vPMU: rename a few PMU functions KVM: MTRR: do not map huge page for non-consistent range KVM: MTRR: simplify kvm_mtrr_get_guest_memory_type KVM: MTRR: introduce mtrr_for_each_mem_type KVM: MTRR: introduce fixed_mtrr_addr_* functions KVM: MTRR: sort variable MTRRs KVM: MTRR: introduce var_mtrr_range KVM: MTRR: introduce fixed_mtrr_segment table KVM: MTRR: improve kvm_mtrr_get_guest_memory_type KVM: MTRR: do not split 64 bits MSR content KVM: MTRR: clean up mtrr default type ...
| * | KVM: x86/vPMU: Define kvm_pmu_ops to support vPMU function dispatchWei Huang2015-06-231-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch defines a new function pointer struct (kvm_pmu_ops) to support vPMU for both Intel and AMD. The functions pointers defined in this new struct will be linked with Intel and AMD functions later. In the meanwhile the struct that maps from event_sel bits to PERF_TYPE_HARDWARE events is renamed and moved from Intel specific code to kvm_host.h as a common struct. Reviewed-by: Joerg Roedel <jroedel@suse.de> Tested-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Wei Huang <wei@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
| * | KVM: x86: move MTRR related code to a separate fileXiao Guangrong2015-06-191-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | MTRR code locates in x86.c and mmu.c so that move them to a separate file to make the organization more clearer and it will be the place where we fully implement vMTRR Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
| * | KVM: x86: fix CR0.CD virtualizationXiao Guangrong2015-06-191-10/+22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently, CR0.CD is not checked when we virtualize memory cache type for noncoherent_dma guests, this patch fixes it by : - setting UC for all memory if CR0.CD = 1 - zapping all the last sptes in MMU if CR0.CD is changed Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
| * | KVM: x86: advertise KVM_CAP_X86_SMMPaolo Bonzini2015-06-051-0/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | ... and we're done. :) Because SMBASE is usually relocated above 1M on modern chipsets, and SMM handlers might indeed rely on 4G segment limits, we only expose it if KVM is able to run the guest in big real mode. This includes any of VMX+emulate_invalid_guest_state, VMX+unrestricted_guest, or SVM. Reviewed-by: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
| * | KVM: x86: work on all available address spacesPaolo Bonzini2015-06-051-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch has no semantic change, but it prepares for the introduction of a second address space for system management mode. A new function x86_set_memory_region (and the "slots_lock taken" counterpart __x86_set_memory_region) is introduced in order to operate on all address spaces when adding or deleting private memory slots. Reviewed-by: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
| * | KVM: x86: use vcpu-specific functions to read/write/translate GFNsPaolo Bonzini2015-06-051-16/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We need to hide SMRAM from guests not running in SMM. Therefore, all uses of kvm_read_guest* and kvm_write_guest* must be changed to check whether the VCPU is in system management mode and use a different set of memslots. Switch from kvm_* to the newly-introduced kvm_vcpu_*, which call into kvm_arch_vcpu_memslots_id. Reviewed-by: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
| * | KVM: x86: pass host_initiated to functions that read MSRsPaolo Bonzini2015-06-041-32/+30
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | SMBASE is only readable from SMM for the VCPU, but it must be always accessible if userspace is accessing it. Thus, all functions that read MSRs are changed to accept a struct msr_data; the host_initiated and index fields are pre-initialized, while the data field is filled on return. Reviewed-by: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
| * | Merge branch 'kvm-master' into kvm-nextPaolo Bonzini2015-05-201-0/+1
| |\ \ | | | | | | | | | | | | | | | | | | | | | | | | Grab MPX bugfix, and fix conflicts against Rik's adaptive FPU deactivation patch. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
| * | | KVM: x86: Fix DR7 mask on task-switch while debuggingNadav Amit2015-05-191-3/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If the host sets hardware breakpoints to debug the guest, and a task-switch occurs in the guest, the architectural DR7 will not be updated. The effective DR7 would be updated instead. This fix puts the DR7 update during task-switch emulation, so it now uses the standard DR setting mechanism instead of the one that was previously used. As a bonus, the update of DR7 will now be effective for AMD as well. Signed-off-by: Nadav Amit <namit@cs.technion.ac.il> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
| * | | KVM: nVMX: Fix host crash when loading MSRs with userspace irqchipJan Kiszka2015-05-081-3/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | vcpu->arch.apic is NULL when a userspace irqchip is active. But instead of letting the test incorrectly depend on in-kernel irqchip mode, open-code it to catch also userspace x2APICs. Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
| * | | KVM: x86: dump VMCS on invalid entryPaolo Bonzini2015-05-071-0/+153
| | | | | | | | | | | | | | | | | | | | | | | | Code and format roughly based on Xen's vmcs_dump_vcpu. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
| * | | KVM: x86: fix initial PAT valueRadim Krčmář2015-05-071-10/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | PAT should be 0007_0406_0007_0406h on RESET and not modified on INIT. VMX used a wrong value (host's PAT) and while SVM used the right one, it never got to arch.pat. This is not an issue with QEMU as it will force the correct value. Signed-off-by: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
| * | | KVM: x86: INIT and reset sequences are differentNadav Amit2015-05-071-22/+29
| | |/ | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | x86 architecture defines differences between the reset and INIT sequences. INIT does not initialize the FPU (including MMX, XMM, YMM, etc.), TSC, PMU, MSRs (in general), MTRRs machine-check, APIC ID, APIC arbitration ID and BSP. References (from Intel SDM): "If the MP protocol has completed and a BSP is chosen, subsequent INITs (either to a specific processor or system wide) do not cause the MP protocol to be repeated." [8.4.2: MP Initialization Protocol Requirements and Restrictions] [Table 9-1. IA-32 Processor States Following Power-up, Reset, or INIT] "If the processor is reset by asserting the INIT# pin, the x87 FPU state is not changed." [9.2: X87 FPU INITIALIZATION] "The state of the local APIC following an INIT reset is the same as it is after a power-up or hardware reset, except that the APIC ID and arbitration ID registers are not affected." [10.4.7.3: Local APIC State After an INIT Reset ("Wait-for-SIPI" State)] Signed-off-by: Nadav Amit <namit@cs.technion.ac.il> Message-Id: <1428924848-28212-1-git-send-email-namit@cs.technion.ac.il> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* | | Merge branch 'linus' into x86/fpuIngo Molnar2015-05-251-0/+1
|\ \ \ | | |/ | |/| | | | | | | | | | | | | | | | | | | | | | Resolve semantic conflict in arch/x86/kvm/cpuid.c with: c447e76b4cab ("kvm/fpu: Enable eager restore kvm FPU for MPX") By removing the FPU internal include files. Signed-off-by: Ingo Molnar <mingo@kernel.org>
| * | Revert "KVM: x86: drop fpu_activate hook"Paolo Bonzini2015-05-201-0/+1
| |/ | | | | | | | | | | | | | | This reverts commit 4473b570a7ebb502f63f292ccfba7df622e5fdd3. We'll use the hook again. Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* | x86/fpu: Rename user_has_fpu() to fpregs_active()Ingo Molnar2015-05-191-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Rename this function in line with the new FPU nomenclature. Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org>
* | x86/fpu: Move asm/xcr.h to asm/fpu/internal.hIngo Molnar2015-05-191-1/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Now that all FPU internals using drivers are converted to public APIs, move xcr.h's definitions into fpu/internal.h and remove xcr.h. Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org>
* | x86/fpu: Move various internal function prototypes to fpu/internal.hIngo Molnar2015-05-191-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There are a number of FPU internal function prototypes and an inline function in fpu/api.h, mostly placed so historically as the code grew over the years. Move them over into fpu/internal.h where they belong. (Add sched.h include to stackprotector.h which incorrectly relied on getting it from fpu/api.h.) fpu/api.h is now a pure file that only contains FPU APIs intended for driver use. Reviewed-by: Borislav Petkov <bp@alien8.de> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org>
* | x86/fpu: Rename i387.h to fpu/api.hIngo Molnar2015-05-191-1/+1
|/ | | | | | | | | | | | | | | | | | We already have fpu/types.h, move i387.h to fpu/api.h. The file name has become a misnomer anyway: it offers generic FPU APIs, but is not limited to i387 functionality. Reviewed-by: Borislav Petkov <bp@alien8.de> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org>
* KVM: VMX: Preserve host CR4.MCE value while in guest mode.Ben Serebrin2015-04-211-2/+10
| | | | | | | | | | | | | | | | | | | | | The host's decision to enable machine check exceptions should remain in force during non-root mode. KVM was writing 0 to cr4 on VCPU reset and passed a slightly-modified 0 to the vmcs.guest_cr4 value. Tested: Built. On earlier version, tested by injecting machine check while a guest is spinning. Before the change, if guest CR4.MCE==0, then the machine check is escalated to Catastrophic Error (CATERR) and the machine dies. If guest CR4.MCE==1, then the machine check causes VMEXIT and is handled normally by host Linux. After the change, injecting a machine check causes normal Linux machine check handling. Signed-off-by: Ben Serebrin <serebrin@google.com> Reviewed-by: Venkatesh Srinivas <venkateshs@google.com> Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds2015-04-131-70/+76
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull KVM updates from Paolo Bonzini: "First batch of KVM changes for 4.1 The most interesting bit here is irqfd/ioeventfd support for ARM and ARM64. Summary: ARM/ARM64: fixes for live migration, irqfd and ioeventfd support (enabling vhost, too), page aging s390: interrupt handling rework, allowing to inject all local interrupts via new ioctl and to get/set the full local irq state for migration and introspection. New ioctls to access memory by virtual address, and to get/set the guest storage keys. SIMD support. MIPS: FPU and MIPS SIMD Architecture (MSA) support. Includes some patches from Ralf Baechle's MIPS tree. x86: bugfixes (notably for pvclock, the others are small) and cleanups. Another small latency improvement for the TSC deadline timer" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (146 commits) KVM: use slowpath for cross page cached accesses kvm: mmu: lazy collapse small sptes into large sptes KVM: x86: Clear CR2 on VCPU reset KVM: x86: DR0-DR3 are not clear on reset KVM: x86: BSP in MSR_IA32_APICBASE is writable KVM: x86: simplify kvm_apic_map KVM: x86: avoid logical_map when it is invalid KVM: x86: fix mixed APIC mode broadcast KVM: x86: use MDA for interrupt matching kvm/ppc/mpic: drop unused IRQ_testbit KVM: nVMX: remove unnecessary double caching of MAXPHYADDR KVM: nVMX: checks for address bits beyond MAXPHYADDR on VM-entry KVM: x86: cache maxphyaddr CPUID leaf in struct kvm_vcpu KVM: vmx: pass error code with internal error #2 x86: vdso: fix pvclock races with task migration KVM: remove kvm_read_hva and kvm_read_hva_atomic KVM: x86: optimize delivery of TSC deadline timer interrupt KVM: x86: extract blocking logic from __vcpu_run kvm: x86: fix x86 eflags fixed bit KVM: s390: migrate vcpu interrupt state ...
| * KVM: x86: BSP in MSR_IA32_APICBASE is writableNadav Amit2015-04-081-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | After reset, the CPU can change the BSP, which will be used upon INIT. Reset should return the BSP which QEMU asked for, and therefore handled accordingly. To quote: "If the MP protocol has completed and a BSP is chosen, subsequent INITs (either to a specific processor or system wide) do not cause the MP protocol to be repeated." [Intel SDM 8.4.2: MP Initialization Protocol Requirements and Restrictions] Signed-off-by: Nadav Amit <namit@cs.technion.ac.il> Message-Id: <1427933438-12782-3-git-send-email-namit@cs.technion.ac.il> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
| * KVM: nVMX: remove unnecessary double caching of MAXPHYADDREugene Korenevsky2015-04-081-8/+6
| | | | | | | | | | | | | | | | | | | | | | After speed-up of cpuid_maxphyaddr() it can be called frequently: instead of heavyweight enumeration of CPUID entries it returns a cached pre-computed value. It is also inlined now. So caching its result became unnecessary and can be removed. Signed-off-by: Eugene Korenevsky <ekorenevsky@gmail.com> Message-Id: <20150329205644.GA1258@gnote> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
| * KVM: nVMX: checks for address bits beyond MAXPHYADDR on VM-entryEugene Korenevsky2015-04-081-6/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | On each VM-entry CPU should check the following VMCS fields for zero bits beyond physical address width: - APIC-access address - virtual-APIC address - posted-interrupt descriptor address This patch adds these checks required by Intel SDM. Signed-off-by: Eugene Korenevsky <ekorenevsky@gmail.com> Message-Id: <20150329205627.GA1244@gnote> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
| * KVM: vmx: pass error code with internal error #2Radim Krčmář2015-04-081-1/+2
| | | | | | | | | | | | | | | | | | Exposing the on-stack error code with internal error is cheap and potentially useful. Signed-off-by: Radim Krčmář <rkrcmar@redhat.com> Message-Id: <1428001865-32280-1-git-send-email-rkrcmar@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
| * Merge tag 'kvm-arm-for-4.1' of ↵Paolo Bonzini2015-04-071-1/+1
| |\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into 'kvm-next' KVM/ARM changes for v4.1: - fixes for live migration - irqfd support - kvm-io-bus & vgic rework to enable ioeventfd - page ageing for stage-2 translation - various cleanups
| | * KVM: Redesign kvm_io_bus_ API to pass VCPU structure to the callbacks.Nikolay Nikolaev2015-03-261-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is needed in e.g. ARM vGIC emulation, where the MMIO handling depends on the VCPU that does the access. Signed-off-by: Nikolay Nikolaev <n.nikolaev@virtualopensystems.com> Signed-off-by: Andre Przywara <andre.przywara@arm.com> Acked-by: Paolo Bonzini <pbonzini@redhat.com> Acked-by: Christoffer Dall <christoffer.dall@linaro.org> Reviewed-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
| * | x86: Use bool function return values of true/false not 1/0Joe Perches2015-03-311-36/+36
| | | | | | | | | | | | | | | | | | | | | | | | Use the normal return values for bool functions Signed-off-by: Joe Perches <joe@perches.com> Message-Id: <9f593eb2f43b456851cd73f7ed09654ca58fb570.1427759009.git.joe@perches.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
| * | KVM: x86: Remove redundant definitionsNadav Amit2015-03-301-9/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Some constants are redfined in emulate.c. Avoid it. s/SELECTOR_RPL_MASK/SEGMENT_RPL_MASK s/SELECTOR_TI_MASK/SEGMENT_TI_MASK No functional change. Signed-off-by: Nadav Amit <namit@cs.technion.ac.il> Message-Id: <1427635984-8113-3-git-send-email-namit@cs.technion.ac.il> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
| * | KVM: nVMX: Add support for rdtscpJan Kiszka2015-03-261-2/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If the guest CPU is supposed to support rdtscp and the host has rdtscp enabled in the secondary execution controls, we can also expose this feature to L1. Just extend nested_vmx_exit_handled to properly route EXIT_REASON_RDTSCP. Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
| * | KVM: nVMX: Do not emulate #UD while in guest modeJan Kiszka2015-03-131-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | While in L2, leave all #UD to L2 and do not try to emulate it. If L1 is interested in doing this, it reports its interest via the exception bitmap, and we never get into handle_exception of L0 anyway. Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
| * | kvm: x86: make kvm_emulate_* consistantJoel Schopp2015-03-101-6/+3
| |/ | | | | | | | | | | | | | | | | | | Currently kvm_emulate() skips the instruction but kvm_emulate_* sometimes don't. The end reult is the caller ends up doing the skip themselves. Let's make them consistant. Signed-off-by: Joel Schopp <joel.schopp@amd.com> Reviewed-by: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
* | KVM: nVMX: mask unrestricted_guest if disabled on L0Radim Krčmář2015-03-171-2/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If EPT was enabled, unrestricted_guest was allowed in L1 regardless of L0. L1 triple faulted when running L2 guest that required emulation. Another side effect was 'WARN_ON_ONCE(vmx->nested.nested_run_pending)' in L0's dmesg: WARNING: CPU: 0 PID: 0 at arch/x86/kvm/vmx.c:9190 nested_vmx_vmexit+0x96e/0xb00 [kvm_intel] () Prevent this scenario by masking SECONDARY_EXEC_UNRESTRICTED_GUEST when the host doesn't have it enabled. Fixes: 78051e3b7e35 ("KVM: nVMX: Disable unrestricted mode if ept=0") Cc: stable@vger.kernel.org Tested-By: Kashyap Chamarthy <kchamart@redhat.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
* | KVM: VMX: Set msr bitmap correctly if vcpu is in guest modeWincy Van2015-03-131-4/+7
|/ | | | | | | | | | | | In commit 3af18d9c5fe9 ("KVM: nVMX: Prepare for using hardware MSR bitmap"), we are setting MSR_BITMAP in prepare_vmcs02 if we should use hardware. This is not enough since the field will be modified by following vmx_set_efer. Fix this by setting vmx_msr_bitmap_nested in vmx_set_msr_bitmap if vcpu is in guest mode. Signed-off-by: Wincy Van <fanwenyi0529@gmail.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
* KVM: VMX: fix build without CONFIG_SMPRadim Krčmář2015-02-231-9/+14
| | | | | | | | | | 'apic' is not defined if !CONFIG_X86_64 && !CONFIG_X86_LOCAL_APIC. Posted interrupt makes no sense without CONFIG_SMP, and CONFIG_X86_LOCAL_APIC will be set with it. Reported-by: kbuild test robot <fengguang.wu@intel.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* Merge branch 'perf-core-for-linus' of ↵Linus Torvalds2015-02-161-5/+5
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 perf updates from Ingo Molnar: "This series tightens up RDPMC permissions: currently even highly sandboxed x86 execution environments (such as seccomp) have permission to execute RDPMC, which may leak various perf events / PMU state such as timing information and other CPU execution details. This 'all is allowed' RDPMC mode is still preserved as the (non-default) /sys/devices/cpu/rdpmc=2 setting. The new default is that RDPMC access is only allowed if a perf event is mmap-ed (which is needed to correctly interpret RDPMC counter values in any case). As a side effect of these changes CR4 handling is cleaned up in the x86 code and a shadow copy of the CR4 value is added. The extra CR4 manipulation adds ~ <50ns to the context switch cost between rdpmc-capable and rdpmc-non-capable mms" * 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf/x86: Add /sys/devices/cpu/rdpmc=2 to allow rdpmc for all tasks perf/x86: Only allow rdpmc if a perf_event is mapped perf: Pass the event to arch_perf_update_userpage() perf: Add pmu callbacks to track event mapping and unmapping x86: Add a comment clarifying LDT context switching x86: Store a per-cpu shadow copy of CR4 x86: Clean up cr4 manipulation
| * x86: Store a per-cpu shadow copy of CR4Andy Lutomirski2015-02-041-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Context switches and TLB flushes can change individual bits of CR4. CR4 reads take several cycles, so store a shadow copy of CR4 in a per-cpu variable. To avoid wasting a cache line, I added the CR4 shadow to cpu_tlbstate, which is already touched in switch_mm. The heaviest users of the cr4 shadow will be switch_mm and __switch_to_xtra, and __switch_to_xtra is called shortly after switch_mm during context switch, so the cacheline is likely to be hot. Signed-off-by: Andy Lutomirski <luto@amacapital.net> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Kees Cook <keescook@chromium.org> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Vince Weaver <vince@deater.net> Cc: "hillf.zj" <hillf.zj@alibaba-inc.com> Cc: Valdis Kletnieks <Valdis.Kletnieks@vt.edu> Cc: Paul Mackerras <paulus@samba.org> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/3a54dd3353fffbf84804398e00dfdc5b7c1afd7d.1414190806.git.luto@amacapital.net Signed-off-by: Ingo Molnar <mingo@kernel.org>
| * x86: Clean up cr4 manipulationAndy Lutomirski2015-02-041-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | CR4 manipulation was split, seemingly at random, between direct (write_cr4) and using a helper (set/clear_in_cr4). Unfortunately, the set_in_cr4 and clear_in_cr4 helpers also poke at the boot code, which only a small subset of users actually wanted. This patch replaces all cr4 access in functions that don't leave cr4 exactly the way they found it with new helpers cr4_set_bits, cr4_clear_bits, and cr4_set_bits_and_update_boot. Signed-off-by: Andy Lutomirski <luto@amacapital.net> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Vince Weaver <vince@deater.net> Cc: "hillf.zj" <hillf.zj@alibaba-inc.com> Cc: Valdis Kletnieks <Valdis.Kletnieks@vt.edu> Cc: Paul Mackerras <paulus@samba.org> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Kees Cook <keescook@chromium.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/495a10bdc9e67016b8fd3945700d46cfd5c12c2f.1414190806.git.luto@amacapital.net Signed-off-by: Ingo Molnar <mingo@kernel.org>
* | KVM: x86: fix build with !CONFIG_SMPRadim Krčmář2015-02-101-0/+1
| | | | | | | | | | | | | | | | | | <asm/apic.h> isn't included directly and without CONFIG_SMP, an option that automagically pulls it can't be enabled. Reported-by: Jim Davis <jim.epost@gmail.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* | KVM: nVMX: Enable nested posted interrupt processingWincy Van2015-02-031-4/+150
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If vcpu has a interrupt in vmx non-root mode, injecting that interrupt requires a vmexit. With posted interrupt processing, the vmexit is not needed, and interrupts are fully taken care of by hardware. In nested vmx, this feature avoids much more vmexits than non-nested vmx. When L1 asks L0 to deliver L1's posted interrupt vector, and the target VCPU is in non-root mode, we use a physical ipi to deliver POSTED_INTR_NV to the target vCPU. Using POSTED_INTR_NV avoids unexpected interrupts if a concurrent vmexit happens and L1's vector is different with L0's. The IPI triggers posted interrupt processing in the target physical CPU. In case the target vCPU was not in guest mode, complete the posted interrupt delivery on the next entry to L2. Signed-off-by: Wincy Van <fanwenyi0529@gmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>