summaryrefslogtreecommitdiffstats
path: root/arch/x86
Commit message (Collapse)AuthorAgeFilesLines
* Merge branch 'kvm-updates/2.6.28' of ↵Linus Torvalds2008-10-1619-921/+1906
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/avi/kvm * 'kvm-updates/2.6.28' of git://git.kernel.org/pub/scm/linux/kernel/git/avi/kvm: (134 commits) KVM: ia64: Add intel iommu support for guests. KVM: ia64: add directed mmio range support for kvm guests KVM: ia64: Make pmt table be able to hold physical mmio entries. KVM: Move irqchip_in_kernel() from ioapic.h to irq.h KVM: Separate irq ack notification out of arch/x86/kvm/irq.c KVM: Change is_mmio_pfn to kvm_is_mmio_pfn, and make it common for all archs KVM: Move device assignment logic to common code KVM: Device Assignment: Move vtd.c from arch/x86/kvm/ to virt/kvm/ KVM: VMX: enable invlpg exiting if EPT is disabled KVM: x86: Silence various LAPIC-related host kernel messages KVM: Device Assignment: Map mmio pages into VT-d page table KVM: PIC: enhance IPI avoidance KVM: MMU: add "oos_shadow" parameter to disable oos KVM: MMU: speed up mmu_unsync_walk KVM: MMU: out of sync shadow core KVM: MMU: mmu_convert_notrap helper KVM: MMU: awareness of new kvm_mmu_zap_page behaviour KVM: MMU: mmu_parent_walk KVM: x86: trap invlpg KVM: MMU: sync roots on mmu reload ...
| * KVM: Separate irq ack notification out of arch/x86/kvm/irq.cXiantao Zhang2008-10-153-42/+1
| | | | | | | | | | | | | | | | Moving irq ack notification logic as common, and make it shared with ia64 side. Signed-off-by: Xiantao Zhang <xiantao.zhang@intel.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
| * KVM: Move device assignment logic to common codeXiantao Zhang2008-10-151-255/+0
| | | | | | | | | | | | | | | | To share with other archs, this patch moves device assignment logic to common parts. Signed-off-by: Xiantao Zhang <xiantao.zhang@intel.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
| * KVM: Device Assignment: Move vtd.c from arch/x86/kvm/ to virt/kvm/Zhang xiantao2008-10-152-194/+3
| | | | | | | | | | | | | | Preparation for kvm/ia64 VT-d support. Signed-off-by: Zhang xiantao <xiantao.zhang@intel.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
| * KVM: VMX: enable invlpg exiting if EPT is disabledMarcelo Tosatti2008-10-151-1/+2
| | | | | | | | | | | | | | | | | | Manually disabling EPT via module option fails to re-enable INVLPG exiting. Reported-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
| * KVM: x86: Silence various LAPIC-related host kernel messagesJan Kiszka2008-10-152-11/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | KVM-x86 dumps a lot of debug messages that have no meaning for normal operation: - INIT de-assertion is ignored - SIPIs are sent and received - APIC writes are unaligned or < 4 byte long (Windows Server 2003 triggers this on SMP) Degrade them to true debug messages, keeping the host kernel log clean for real problems. Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Avi Kivity <avi@redhat.com>
| * KVM: Device Assignment: Map mmio pages into VT-d page tableWeidong Han2008-10-151-18/+11
| | | | | | | | | | | | | | | | Assigned device could DMA to mmio pages, so also need to map mmio pages into VT-d page table. Signed-off-by: Weidong Han <weidong.han@intel.com> Signed-off-by: Avi Kivity <avi@redhat.com>
| * KVM: PIC: enhance IPI avoidanceMarcelo Tosatti2008-10-153-2/+18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The PIC code makes little effort to avoid kvm_vcpu_kick(), resulting in unnecessary guest exits in some conditions. For example, if the timer interrupt is routed through the IOAPIC, IRR for IRQ 0 will get set but not cleared, since the APIC is handling the acks. This means that everytime an interrupt < 16 is triggered, the priority logic will find IRQ0 pending and send an IPI to vcpu0 (in case IRQ0 is not masked, which is Linux's case). Introduce a new variable isr_ack to represent the IRQ's for which the guest has been signalled / cleared the ISR. Use it to avoid more than one IPI per trigger-ack cycle, in addition to the avoidance when ISR is set in get_priority(). Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
| * KVM: MMU: add "oos_shadow" parameter to disable oosMarcelo Tosatti2008-10-151-1/+4
| | | | | | | | | | | | | | Subject says it all. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
| * KVM: MMU: speed up mmu_unsync_walkMarcelo Tosatti2008-10-151-12/+60
| | | | | | | | | | | | | | Cache the unsynced children information in a per-page bitmap. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
| * KVM: MMU: out of sync shadow coreMarcelo Tosatti2008-10-153-18/+197
| | | | | | | | | | | | | | | | | | | | | | Allow guest pagetables to go out of sync. Instead of emulating write accesses to guest pagetables, or unshadowing them, we un-write-protect the page table and allow the guest to modify it at will. We rely on invlpg executions to synchronize individual ptes, and will synchronize the entire pagetable on tlb flushes. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
| * KVM: MMU: mmu_convert_notrap helperMarcelo Tosatti2008-10-151-0/+14
| | | | | | | | | | | | | | | | Need to convert shadow_notrap_nonpresent -> shadow_trap_nonpresent when unsyncing pages. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
| * KVM: MMU: awareness of new kvm_mmu_zap_page behaviourMarcelo Tosatti2008-10-151-4/+9
| | | | | | | | | | | | | | | | kvm_mmu_zap_page will soon zap the unsynced children of a page. Restart list walk in such case. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
| * KVM: MMU: mmu_parent_walkMarcelo Tosatti2008-10-151-0/+27
| | | | | | | | | | | | | | Introduce a function to walk all parents of a given page, invoking a handler. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
| * KVM: x86: trap invlpgMarcelo Tosatti2008-10-155-5/+71
| | | | | | | | | | | | | | | | | | | | With pages out of sync invlpg needs to be trapped. For now simply nuke the entry. Untested on AMD. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
| * KVM: MMU: sync roots on mmu reloadMarcelo Tosatti2008-10-152-0/+37
| | | | | | | | | | Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
| * KVM: MMU: mode specific sync_pageMarcelo Tosatti2008-10-152-0/+64
| | | | | | | | | | | | | | | | Examine guest pagetable and bring the shadow back in sync. Caller is responsible for local TLB flush before re-entering guest mode. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
| * KVM: MMU: do not write-protect large mappingsMarcelo Tosatti2008-10-151-2/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | There is not much point in write protecting large mappings. This can only happen when a page is shadowed during the window between is_largepage_backed and mmu_lock acquision. Zap the entry instead, so the next pagefault will find a shadowed page via is_largepage_backed and fallback to 4k translations. Simplifies out of sync shadow. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
| * KVM: MMU: move local TLB flush to mmu_set_spteMarcelo Tosatti2008-10-151-4/+4
| | | | | | | | | | | | | | Since the sync page path can collapse flushes. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
| * KVM: MMU: split mmu_set_spteMarcelo Tosatti2008-10-151-44/+57
| | | | | | | | | | | | | | Split the spte entry creation code into a new set_spte function. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
| * KVM: MMU: flush remote TLBs on large->normal entry overwriteMarcelo Tosatti2008-10-151-1/+4
| | | | | | | | | | | | | | | | It is necessary to flush all TLB's when a large spte entry is overwritten with a normal page directory pointer. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
| * x86: pvclock: fix shadowed variable warningHarvey Harrison2008-10-151-5/+5
| | | | | | | | | | | | | | | | | | arch/x86/kernel/pvclock.c:102:6: warning: symbol 'tsc_khz' shadows an earlier one include/asm/tsc.h:18:21: originally declared here Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Avi Kivity <avi@redhat.com>
| * KVM: don't enter guest after SIPI was received by a CPUGleb Natapov2008-10-151-1/+1
| | | | | | | | | | | | | | | | | | The vcpu should process pending SIPI message before entering guest mode again. kvm_arch_vcpu_runnable() returns true if the vcpu is in SIPI state, so we can't call it here. Signed-off-by: Gleb Natapov <gleb@qumranet.com> Signed-off-by: Avi Kivity <avi@redhat.com>
| * KVM: x86.c make kvm_load_realmode_segment staticHarvey Harrison2008-10-151-1/+1
| | | | | | | | | | | | | | | | Noticed by sparse: arch/x86/kvm/x86.c:3591:5: warning: symbol 'kvm_load_realmode_segment' was not declared. Should it be static? Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com> Signed-off-by: Avi Kivity <avi@redhat.com>
| * KVM: switch to get_user_pages_fastMarcelo Tosatti2008-10-154-31/+10
| | | | | | | | | | | | | | | | | | Convert gfn_to_pfn to use get_user_pages_fast, which can do lockless pagetable lookups on x86. Kernel compilation on 4-way guest is 3.7% faster on VMX. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
| * KVM: Device Assignment: Free device structures if IRQ allocation failsAmit Shah2008-10-151-41/+45
| | | | | | | | | | | | | | | | | | When an IRQ allocation fails, we free up the device structures and disable the device so that we can unregister the device in the userspace and not expose it to the guest at all. Signed-off-by: Amit Shah <amit.shah@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
| * KVM: Device Assignment with VT-dBen-Ami Yassour2008-10-153-0/+215
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Based on a patch by: Kay, Allen M <allen.m.kay@intel.com> This patch enables PCI device assignment based on VT-d support. When a device is assigned to the guest, the guest memory is pinned and the mapping is updated in the VT-d IOMMU. [Amit: Expose KVM_CAP_IOMMU so we can check if an IOMMU is present and also control enable/disable from userspace] Signed-off-by: Kay, Allen M <allen.m.kay@intel.com> Signed-off-by: Weidong Han <weidong.han@intel.com> Signed-off-by: Ben-Ami Yassour <benami@il.ibm.com> Signed-off-by: Amit Shah <amit.shah@qumranet.com> Acked-by: Mark Gross <mgross@linux.intel.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
| * KVM: x86 emulator: Use DstAcc for 'and'Guillaume Thouvenin2008-10-151-19/+2
| | | | | | | | | | | | | | | | For instruction 'and al,imm' we use DstAcc instead of doing the emulation directly into the instruction's opcode. Signed-off-by: Guillaume Thouvenin <guillaume.thouvenin@ext.bull.net> Signed-off-by: Avi Kivity <avi@qumranet.com>
| * KVM: x86 emulator: Add cmp al, imm and cmp ax, imm instructions (ocodes 3c, 3d)Guillaume Thouvenin2008-10-151-1/+2
| | | | | | | | | | | | | | Add decode entries for these opcodes; execution is already implemented. Signed-off-by: Guillaume Thouvenin <guillaume.thouvenin@ext.bull.net> Signed-off-by: Avi Kivity <avi@qumranet.com>
| * KVM: x86 emulator: Add DstAcc operand typeGuillaume Thouvenin2008-10-151-16/+34
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add DstAcc operand type. That means that there are 4 bits now for DstMask. "In the good old days cpus would have only one register that was able to fully participate in arithmetic operations, typically called A for Accumulator. The x86 retains this tradition by having special, shorter encodings for the A register (like the cmp opcode), and even some instructions that only operate on A (like mul). SrcAcc and DstAcc would accommodate these instructions by decoding A into the corresponding 'struct operand'." -- Avi Kivity Signed-off-by: Guillaume Thouvenin <guillaume.thouvenin@ext.bull.net> Signed-off-by: Avi Kivity <avi@qumranet.com>
| * x86: Move FEATURE_CONTROL bits to msr-index.hSheng Yang2008-10-151-3/+0
| | | | | | | | | | | | | | For MSR_IA32_FEATURE_CONTROL is already there. Signed-off-by: Sheng Yang <sheng.yang@intel.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
| * KVM: VMX: Rename IA32_FEATURE_CONTROL bitsSheng Yang2008-10-152-11/+11
| | | | | | | | | | Signed-off-by: Sheng Yang <sheng.yang@intel.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
| * KVM: x86 emulator: fix jmp r/m64 instructionAvi Kivity2008-10-151-1/+1
| | | | | | | | | | | | | | | | jmp r/m64 doesn't require the rex.w prefix to indicate the operand size is 64 bits. Set the Stack attribute (even though it doesn't involve the stack, really) to indicate this. Signed-off-by: Avi Kivity <avi@qumranet.com>
| * KVM: VMX: Cleanup stalled INTR_INFO readJan Kiszka2008-10-151-3/+0
| | | | | | | | | | | | | | | | Commit 1c0f4f5011829dac96347b5f84ba37c2252e1e08 left a useless access of VM_ENTRY_INTR_INFO_FIELD in vmx_intr_assist behind. Clean this up. Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
| * KVM: x86: unhalt vcpu0 on resetMarcelo Tosatti2008-10-151-0/+6
| | | | | | | | | | | | | | | | | | | | Since "KVM: x86: do not execute halted vcpus", HLT by vcpu0 before system reset by the IO thread will hang the guest. Mark vcpu as runnable in such case. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
| * KVM: x86 emulator: Add call near absolute instruction (opcode 0xff/2)Mohammed Gamal2008-10-151-1/+10
| | | | | | | | | | | | | | Add call near absolute instruction. Signed-off-by: Mohammed Gamal <m.gamal005@gmail.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
| * KVM: x86: do not execute halted vcpusMarcelo Tosatti2008-10-153-60/+61
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Offline or uninitialized vcpu's can be executed if requested to perform userspace work. Follow Avi's suggestion to handle halted vcpu's in the main loop, simplifying kvm_emulate_halt(). Introduce a new vcpu->requests bit to indicate events that promote state from halted to running. Also standardize vcpu wake sites. Signed-off-by: Marcelo Tosatti <mtosatti <at> redhat.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
| * KVM: x86 emulator: Add in/out instructions (opcodes 0xe4-0xe7, 0xec-0xef)Mohammed Gamal2008-10-151-2/+33
| | | | | | | | | | | | | | | | | | | | The patch adds in/out instructions to the x86 emulator. The instruction was encountered while running the BIOS while using the invalid guest state emulation patch. Signed-off-by: Mohammed Gamal <m.gamal005@gmail.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
| * KVM: Add statistics for guest irq injectionsAvi Kivity2008-10-153-0/+3
| | | | | | | | | | | | These can help show whether a guest is making progress or not. Signed-off-by: Avi Kivity <avi@qumranet.com>
| * KVM: MMU: Modify kvm_shadow_walk.entry to accept u64 addrSheng Yang2008-10-152-7/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | EPT is 4 level by default in 32pae(48 bits), but the addr parameter of kvm_shadow_walk->entry() only accept unsigned long as virtual address, which is 32bit in 32pae. This result in SHADOW_PT_INDEX() overflow when try to fetch level 4 index. Fix it by extend kvm_shadow_walk->entry() to accept 64bit addr in parameter. Signed-off-by: Sheng Yang <sheng.yang@intel.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
| * KVM: x86 emulator: Add std and cld instructions (opcodes 0xfc-0xfd)Mohammed Gamal2008-10-151-1/+9
| | | | | | | | | | | | | | | | | | | | This adds the std and cld instructions to the emulator. Encountered while running the BIOS with invalid guest state emulation enabled. Signed-off-by: Mohammed Gamal <m.gamal005@gmail.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
| * KVM: add MC5_MISC msr read supportJoerg Roedel2008-10-151-0/+1
| | | | | | | | | | | | | | | | | | Currently KVM implements MC0-MC4_MISC read support. When booting Linux this results in KVM warnings in the kernel log when the guest tries to read MC5_MISC. Fix this warnings with this patch. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
| * KVM: SVM: No need to unprotect memory during event injection when using nptAvi Kivity2008-10-151-1/+1
| | | | | | | | | | | | No memory is protected anyway. Signed-off-by: Avi Kivity <avi@qumranet.com>
| * KVM: MMU: Fix setting the accessed bit on non-speculative sptesAvi Kivity2008-10-151-1/+1
| | | | | | | | | | | | | | | | | | | | | | The accessed bit was accidentally turned on in a random flag word, rather than, the spte itself, which was lucky, since it used the non-EPT compatible PT_ACCESSED_MASK. Fix by turning the bit on in the spte and changing it to use the portable accessed mask. Signed-off-by: Avi Kivity <avi@qumranet.com>
| * KVM: MMU: Flush tlbs after clearing write permission when accessing dirty logAvi Kivity2008-10-151-0/+1
| | | | | | | | | | | | | | Otherwise, the cpu may allow writes to the tracked pages, and we lose some display bits or fail to migrate correctly. Signed-off-by: Avi Kivity <avi@qumranet.com>
| * KVM: MMU: Add locking around kvm_mmu_slot_remove_write_access()Avi Kivity2008-10-151-0/+2
| | | | | | | | | | | | | | It was generally safe due to slots_lock being held for write, but it wasn't very nice. Signed-off-by: Avi Kivity <avi@qumranet.com>
| * KVM: MMU: Account for npt/ept/realmode page faultsAvi Kivity2008-10-151-0/+1
| | | | | | | | | | | | | | Now that two-dimensional paging is becoming common, account for tdp page faults. Signed-off-by: Avi Kivity <avi@qumranet.com>
| * KVM: x86 emulator: Add mov r, imm instructions (opcodes 0xb0-0xbf)Mohammed Gamal2008-10-151-4/+11
| | | | | | | | | | | | | | | | The emulator only supported one instance of mov r, imm instruction (opcode 0xb8), this adds the rest of these instructions. Signed-off-by: Mohammed Gamal <m.gamal005@gmail.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
| * KVM: Allocate guest memory as MAP_PRIVATE, not MAP_SHAREDAvi Kivity2008-10-151-1/+1
| | | | | | | | | | | | There is no reason to share internal memory slots with fork()ed instances. Signed-off-by: Avi Kivity <avi@qumranet.com>
| * KVM: MMU: Convert the paging mode shadow walk to use the generic walkerAvi Kivity2008-10-151-72/+86
| | | | | | | | Signed-off-by: Avi Kivity <avi@qumranet.com>