summaryrefslogtreecommitdiffstats
path: root/arch/x86/kvm
Commit message (Collapse)AuthorAgeFilesLines
* KVM: Make kvm host use the paravirt clocksource structsGerd Hoffmann2008-06-241-13/+62
| | | | | | | | This patch updates the kvm host code to use the pvclock structs. It also makes the paravirt clock compatible with Xen. Signed-off-by: Gerd Hoffmann <kraxel@redhat.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: VMX: Fix host msr corruption with preemption enabledAvi Kivity2008-06-241-8/+11
| | | | | | | | | | | | | Switching msrs can occur either synchronously as a result of calls to the msr management functions (usually in response to the guest touching virtualized msrs), or asynchronously when preempting a kvm thread that has guest state loaded. If we're unlucky enough to have the two at the same time, host msrs are corrupted and the machine goes kaput on the next syscall. Most easily triggered by Windows Server 2008, as it does a lot of msr switching during bootup. Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: MMU: Fix oops on guest userspace access to guest pagetableAvi Kivity2008-06-241-6/+0
| | | | | | | | | | | | | | | | KVM has a heuristic to unshadow guest pagetables when userspace accesses them, on the assumption that most guests do not allow userspace to access pagetables directly. Unfortunately, in addition to unshadowing the pagetables, it also oopses. This never triggers on ordinary guests since sane OSes will clear the pagetables before assigning them to userspace, which will trigger the flood heuristic, unshadowing the pagetables before the first userspace access. One particular guest, though (Xenner) will run the kernel in userspace, triggering the oops. Since the heuristic is incorrect in this case, we can simply remove it. Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: MMU: large page update_pte issue with non-PAE 32-bit guests (resend)Marcelo Tosatti2008-06-241-5/+7
| | | | | | | | | | | | | | kvm_mmu_pte_write() does not handle 32-bit non-PAE large page backed guests properly. It will instantiate two 2MB sptes pointing to the same physical 2MB page when a guest large pte update is trapped. Instead of duplicating code to handle this, disallow directory level updates to happen through kvm_mmu_pte_write(), so the two 2MB sptes emulating one guest 4MB pte can be correctly created by the page fault handling path. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: MMU: Fix rmap_write_protect() hugepage iteration bugMarcelo Tosatti2008-06-241-0/+1
| | | | | | | | rmap_next() does not work correctly after rmap_remove(), as it expects the rmap chains not to change during iteration. Fix (for now) by restarting iteration from the beginning. Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: close timer injection race window in __vcpu_runMarcelo Tosatti2008-06-243-3/+8
| | | | | | | | | | | | | | If a timer fires after kvm_inject_pending_timer_irqs() but before local_irq_disable() the code will enter guest mode and only inject such timer interrupt the next time an unrelated event causes an exit. It would be simpler if the timer->pending irq conversion could be done with IRQ's disabled, so that the above problem cannot happen. For now introduce a new vcpu requests bit to cancel guest entry. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: Fix race between timer migration and vcpu migrationMarcelo Tosatti2008-06-241-12/+3
| | | | | | | | | | | | | | | A guest vcpu instance can be scheduled to a different physical CPU between the test for KVM_REQ_MIGRATE_TIMER and local_irq_disable(). If that happens, the timer will only be migrated to the current pCPU on the next exit, meaning that guest LAPIC timer event can be delayed until a host interrupt is triggered. Fix it by cancelling guest entry if any vcpu request is pending. This has the side effect of nicely consolidating vcpu->requests checks. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: MMU: Fix is_empty_shadow_page() checkAvi Kivity2008-06-061-1/+1
| | | | | | The check is only looking at one of two possible empty ptes. Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: MMU: Fix printk() format stringAvi Kivity2008-06-061-1/+1
| | | | Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: MMU: reschedule during shadow teardownAvi Kivity2008-06-061-0/+1
| | | | | | | Shadows for large guests can take a long time to tear down, so reschedule occasionally to avoid softlockup warnings. Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: VMX: Clear CR4.VMXE in hardware_disableEli Collins2008-06-061-0/+1
| | | | | | | | | | | | | Clear CR4.VMXE in hardware_disable. There's no reason to leave it set after doing a VMXOFF. VMware Workstation 6.5 checks CR4.VMXE as a proxy for whether the CPU is in VMX mode, so leaving VMXE set means we'll refuse to power on. With this change the user can power on after unloading the kvm-intel module. I tested on kvm-67 and kvm-69. Signed-off-by: Eli Collins <ecollins@vmware.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: migrate PIT timerMarcelo Tosatti2008-06-066-4/+24
| | | | | | | | | Migrate the PIT timer to the physical CPU which vcpu0 is scheduled on, similarly to what is done for the LAPIC timers, otherwise PIT interrupts will be delayed until an unrelated event causes an exit. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: x86 emulator: fix hypercall return value on AMDAvi Kivity2008-06-061-1/+2
| | | | | | | | | | | | | | | | | | The hypercall instructions on Intel and AMD are different. KVM allows the guest to choose one or the other (the default is Intel), and if the guest chooses incorrectly, KVM will patch it at runtime to select the correct instruction. This allows live migration between Intel and AMD machines. This patching occurs in the x86 emulator. The current code also executes the hypercall. Unfortunately, the tail end of the x86 emulator code also executes, overwriting the return value of the hypercall with the original contents of rax (which happens to be the hypercall number). Fix not by executing the hypercall in the emulator context; instead let the guest reissue the patched instruction and execute the hypercall via the normal path. Signed-off-by: Avi Kivity <avi@qumranet.com>
* namespacecheck: automated fixesIngo Molnar2008-05-231-1/+1
| | | | Signed-off-by: Ingo Molnar <mingo@elte.hu>
* KVM: LAPIC: ignore pending timers if LVTT is disabledMarcelo Tosatti2008-05-181-1/+1
| | | | | | | | | | | Only use the APIC pending timers count to break out of HLT emulation if the timer vector is enabled. Certain configurations of Windows simply mask out the vector without disabling the timer. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: PIT: take inject_pending into account when emulating hltMarcelo Tosatti2008-05-181-1/+1
| | | | | | | Otherwise hlt emulation fails if PIT is not injecting IRQ's. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: x86 emulator: fix writes to registers with modrm encodingsAvi Kivity2008-05-181-2/+5
| | | | | | | | | A register destination encoded with a mod=3 encoding left dst.ptr NULL. Normally we don't trap writes to registers, but in the case of smsw, we do. Fix by pointing dst.ptr at the destination register. Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: MMU: Allow more than PAGES_PER_HPAGE write protections per large pageAvi Kivity2008-05-041-1/+0
| | | | | | | | | nonpae guests can call rmap_write_protect twice per page (for page tables) or four times per page (for page directories), triggering a bogus warning. Remove the warning. Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: avoid fx_init() schedule in atomicAndrea Arcangeli2008-05-041-1/+10
| | | | | | | | | | This make sure not to schedule in atomic during fx_init. I also changed the name of fpu_init to fx_finit to avoid duplicating the name with fpu_init that is already used in the kernel, this makes grep simpler if nothing else. Signed-off-by: Andrea Arcangeli <andrea@qumranet.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: Avoid spurious execeptions after setting registersJan Kiszka2008-05-041-0/+2
| | | | | | | | | Clear pending exceptions when setting new register values. This avoids spurious exceptions after restoring a vcpu state or after reset-on-triple-fault. Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: PIT: support mode 4Marcelo Tosatti2008-05-041-0/+2
| | | | | | | | | | | | | | | The in-kernel PIT emulation ignores pending timers if operating under mode 4, which for example DragonFlyBSD uses (and Plan9 too, apparently). Mode 4 seems to be similar to one-shot mode, other than the fact that it starts counting after the next CLK pulse once programmed, while mode 1 starts counting immediately, so add a FIXME to enhance precision. Fixes sourceforge bug 1952988. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Acked-by: Sheng Yang <sheng.yang@intel.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: x86 emulator: disable writeback on lmswAvi Kivity2008-05-041-0/+1
| | | | | | | | | | The recent changes allowing memory operands with lmsw and smsw left lmsw with writeback enabled. Since lmsw has no oridinary destination operand, the dst pointer was not initialized, resulting in an oops. Close the hole by disabling writeback for lmsw. Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: x86: task switch: fix wrong bit setting for the busy flagIzik Eidus2008-05-041-2/+2
| | | | | | | The busy bit is bit 1 of the type field, not bit 8. Signed-off-by: Izik Eidus <izike@qumranet.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: VMX: Enable EPT feature for KVMSheng Yang2008-05-043-10/+233
| | | | | Signed-off-by: Sheng Yang <sheng.yang@intel.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: VMX: Prepare an identity page table for EPT in real modeSheng Yang2008-05-043-3/+81
| | | | | | | | [aliguory: plug leak] Signed-off-by: Sheng Yang <sheng.yang@intel.com> Signed-off-by: Anthony Liguori <aliguori@us.ibm.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: MMU: Remove #ifdef CONFIG_X86_64 to support 4 level EPTSheng Yang2008-05-041-4/+0
| | | | | | | | Currently EPT level is 4 for both pae and x86_64. The patch remove the #ifdef for alloc root_hpa and free root_hpa to support EPT. Signed-off-by: Sheng Yang <sheng.yang@intel.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: MMU: Add EPT supportSheng Yang2008-05-042-10/+36
| | | | | | | Enable kvm_set_spte() to generate EPT entries. Signed-off-by: Sheng Yang <sheng.yang@intel.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: Add kvm_x86_ops get_tdp_level()Sheng Yang2008-05-045-8/+19
| | | | | | | | The function get_tdp_level() provided the number of tdp level for EPT and NPT rather than the NPT specific macro. Signed-off-by: Sheng Yang <sheng.yang@intel.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: MMU: Move some definitions to a header fileSheng Yang2008-05-042-34/+33
| | | | | | | | Move some definitions to mmu.h in order to allow building common table entries between EPT and non-EPT. Signed-off-by: Sheng Yang <sheng.yang@intel.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: VMX: EPT Feature DetectionSheng Yang2008-05-042-5/+83
| | | | | Signed-off-by: Sheng Yang <sheng.yang@intel.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
* rename div64_64 to div64_u64Roman Zippel2008-05-012-6/+6
| | | | | | | | | | | | | | | | | | | | | Rename div64_64 to div64_u64 to make it consistent with the other divide functions, so it clearly includes the type of the divide. Move its definition to math64.h as currently no architecture overrides the generic implementation. They can still override it of course, but the duplicated declarations are avoided. Signed-off-by: Roman Zippel <zippel@linux-m68k.org> Cc: Avi Kivity <avi@qumranet.com> Cc: Russell King <rmk@arm.linux.org.uk> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: David Howells <dhowells@redhat.com> Cc: Jeff Dike <jdike@addtoit.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: "David S. Miller" <davem@davemloft.net> Cc: Patrick McHardy <kaber@trash.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* KVM: MMU: kvm_pv_mmu_op should not take mmap_semMarcelo Tosatti2008-04-271-3/+0
| | | | | | | | | | | kvm_pv_mmu_op should not take mmap_sem. All gfn_to_page() callers down in the MMU processing will take it if necessary, so as it is it can deadlock. Apparently a leftover from the days before slots_lock. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: SVM: remove selective CR0 commentJoerg Roedel2008-04-271-11/+0
| | | | | | | | | | | There is not selective cr0 intercept bug. The code in the comment sets the CR0.PG bit. But KVM sets the CR4.PG bit for SVM always to implement the paged real mode. So the 'mov %eax,%cr0' instruction does not change the CR0.PG bit. Selective CR0 intercepts only occur when a bit is actually changed. So its the right behavior that there is no intercept on this instruction. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: SVM: remove now obsolete FIXME commentJoerg Roedel2008-04-271-7/+0
| | | | | | | With the usage of the V_TPR field this comment is now obsolete. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: SVM: disable CR8 intercept when tpr is not masking interruptsJoerg Roedel2008-04-271-4/+27
| | | | | | | | | This patch disables the intercept of CR8 writes if the TPR is not masking interrupts. This reduces the total number CR8 intercepts to below 1 percent of what we have without this patch using Windows 64 bit guests. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: SVM: sync V_TPR with LAPIC.TPR if CR8 write intercept is disabledJoerg Roedel2008-04-271-0/+12
| | | | | | | | If the CR8 write intercept is disabled the V_TPR field of the VMCB needs to be synced with the TPR field in the local apic. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: export kvm_lapic_set_tpr() to modulesJoerg Roedel2008-04-271-0/+1
| | | | | | | | This patch exports the kvm_lapic_set_tpr() function from the lapic code to modules. It is required in the kvm-amd module to optimize CR8 intercepts. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: SVM: sync TPR value to V_TPR field in the VMCBJoerg Roedel2008-04-271-2/+16
| | | | | | | | This patch adds syncing of the lapic.tpr field to the V_TPR field of the VMCB. With this change we can safely remove the CR8 read intercept. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: x86 emulator: fix lea to really get the effective addressAvi Kivity2008-04-271-1/+1
| | | | | | We never hit this, since there is currently no reason to emulate lea. Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: x86 emulator: fix smsw and lmsw with a memory operandAvi Kivity2008-04-271-12/+17
| | | | | | | | lmsw and smsw were implemented only with a register operand. Extend them to support a memory operand as well. Fixes Windows running some display compatibility test on AMD hosts. Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: x86 emulator: initialize src.val and dst.val for register operandsAvi Kivity2008-04-271-0/+2
| | | | | | This lets us treat the case where mod == 3 in the same manner as other cases. Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: SVM: force a new asid when initializing the vmcbAvi Kivity2008-04-271-1/+1
| | | | | | | Shutdown interception clears the vmcb, leaving the asid at zero (which is illegal. so force a new asid on vmcb initialization. Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: fix kvm_vcpu_kick vs __vcpu_run raceMarcelo Tosatti2008-04-271-2/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | There is a window open between testing of pending IRQ's and assignment of guest_mode in __vcpu_run. Injection of IRQ's can race with __vcpu_run as follows: CPU0 CPU1 kvm_x86_ops->run() vcpu->guest_mode = 0 SET_IRQ_LINE ioctl .. kvm_x86_ops->inject_pending_irq kvm_cpu_has_interrupt() apic_test_and_set_irr() kvm_vcpu_kick if (vcpu->guest_mode) send_ipi() vcpu->guest_mode = 1 So move guest_mode=1 assignment before ->inject_pending_irq, and make sure that it won't reorder after it. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: add ioctls to save/store mpstateMarcelo Tosatti2008-04-271-0/+19
| | | | | | | | | | | | | So userspace can save/restore the mpstate during migration. [avi: export the #define constants describing the value] [christian: add s390 stubs] [avi: ditto for ia64] Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Carsten Otte <cotte@de.ibm.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: Rename VCPU_MP_STATE_* to KVM_MP_STATE_*Avi Kivity2008-04-273-18/+18
| | | | | | We wish to export it to userspace, so move it into the kvm namespace. Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: hlt emulation should take in-kernel APIC/PIT timers into accountMarcelo Tosatti2008-04-274-0/+38
| | | | | | | | | | | | | | | | | | Timers that fire between guest hlt and vcpu_block's add_wait_queue() are ignored, possibly resulting in hangs. Also make sure that atomic_inc and waitqueue_active tests happen in the specified order, otherwise the following race is open: CPU0 CPU1 if (waitqueue_active(wq)) add_wait_queue() if (!atomic_read(pit_timer->pending)) schedule() atomic_inc(pit_timer->pending) Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: SVM: do not intercept task switch with NPTJoerg Roedel2008-04-271-0/+1
| | | | | | | | When KVM uses NPT there is no reason to intercept task switches. This patch removes the intercept for it in that case. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: Add kvm trace userspace interfaceFeng(Eric) Liu2008-04-272-0/+14
| | | | | | | | This interface allows user a space application to read the trace of kvm related events through relayfs. Signed-off-by: Feng (Eric) Liu <eric.e.liu@intel.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: Add trace markersFeng (Eric) Liu2008-04-272-1/+60
| | | | | | | | Trace markers allow userspace to trace execution of a virtual machine in order to monitor its performance. Signed-off-by: Feng (Eric) Liu <eric.e.liu@intel.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: SVM: add intercept for machine check exceptionJoerg Roedel2008-04-271-1/+16
| | | | | | | | | To properly forward a MCE occured while the guest is running to the host, we have to intercept this exception and call the host handler by hand. This is implemented by this patch. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@qumranet.com>