summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* s390: KVM preparation: host memory management changes for s390 kvmChristian Borntraeger2008-04-272-6/+93
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch changes the s390 memory management defintions to use the pgste field for dirty and reference bit tracking of host and guest code. Usually on s390, dirty and referenced are tracked in storage keys, which belong to the physical page. This changes with virtualization: The guest and host dirty/reference bits are defined to be the logical OR of the values for the mapping and the physical page. This patch implements the necessary changes in pgtable.h for s390. There is a common code change in mm/rmap.c, the call to page_test_and_clear_young must be moved. This is a no-op for all architecture but s390. page_referenced checks the referenced bits for the physiscal page and for all mappings: o The physical page is checked with page_test_and_clear_young. o The mappings are checked with ptep_test_and_clear_young and friends. Without pgstes (the current implementation on Linux s390) the physical page check is implemented but the mapping callbacks are no-ops because dirty and referenced are not tracked in the s390 page tables. The pgstes introduces guest and host dirty and reference bits for s390 in the host mapping. These mapping must be checked before page_test_and_clear_young resets the reference bit. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Acked-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Carsten Otte <cotte@de.ibm.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
* s390: KVM preparation: provide hook to enable pgstes in user pagetableCarsten Otte2008-04-278-5/+82
| | | | | | | | | | | | | | | | | | | | | | | | The SIE instruction on s390 uses the 2nd half of the page table page to virtualize the storage keys of a guest. This patch offers the s390_enable_sie function, which reorganizes the page tables of a single-threaded process to reserve space in the page table: s390_enable_sie makes sure that the process is single threaded and then uses dup_mm to create a new mm with reorganized page tables. The old mm is freed and the process has now a page status extended field after every page table. Code that wants to exploit pgstes should SELECT CONFIG_PGSTE. This patch has a small common code hit, namely making dup_mm non-static. Edit (Carsten): I've modified Martin's patch, following Jeremy Fitzhardinge's review feedback. Now we do have the prototype for dup_mm in include/linux/sched.h. Following Martin's suggestion, s390_enable_sie() does now call task_lock() to prevent race against ptrace modification of mm_users. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Carsten Otte <cotte@de.ibm.com> Acked-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: x86: hardware task switching supportIzik Eidus2008-04-276-3/+507
| | | | | | | | | This emulates the x86 hardware task switch mechanism in software, as it is unsupported by either vmx or svm. It allows operating systems which use it, like freedos, to run as kvm guests. Signed-off-by: Izik Eidus <izike@qumranet.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: x86: add functions to get the cpl of vcpuIzik Eidus2008-04-273-0/+24
| | | | | Signed-off-by: Izik Eidus <izike@qumranet.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: VMX: Add module option to disable flexpriorityAvi Kivity2008-04-271-2/+6
| | | | | | Useful for debugging. Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: no longer EXPERIMENTALAvi Kivity2008-04-271-1/+1
| | | | | | Long overdue. Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: MMU: Introduce and use spte_to_page()Avi Kivity2008-04-271-5/+12
| | | | | | Encapsulate the pte mask'n'shift in a function. Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: MMU: fix dirty bit setting when removing write permissionsIzik Eidus2008-04-271-0/+8
| | | | | | | | | | | | | | When mmu_set_spte() checks if a page related to spte should be release as dirty or clean, it check if the shadow pte was writeble, but in case rmap_write_protect() is called called it is possible for shadow ptes that were writeble to become readonly and therefor mmu_set_spte will release the pages as clean. This patch fix this issue by marking the page as dirty inside rmap_write_protect(). Signed-off-by: Izik Eidus <izike@qumranet.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: Move some x86 specific constants and structures to include/asm-x86Avi Kivity2008-04-272-13/+13
| | | | Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: MMU: Set the accessed bit on non-speculative shadow ptesAvi Kivity2008-04-272-5/+7
| | | | | | | | | If we populate a shadow pte due to a fault (and not speculatively due to a pte write) then we can set the accessed bit on it, as we know it will be set immediately on the next guest instruction. This saves a read-modify-write operation. Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: kvm.h: __user requires compiler.hChristian Borntraeger2008-04-271-0/+1
| | | | | | | | | | | | | | | | include/linux/kvm.h defines struct kvm_dirty_log to [...] union { void __user *dirty_bitmap; /* one bit per page */ __u64 padding; }; __user requires compiler.h to compile. Currently, this works on x86 only coincidentally due to other include files. This patch makes kvm.h compile in all cases. Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
* x86: KVM guest: disable clock before rebooting.Glauber Costa2008-04-271-0/+27
| | | | | | | | | | | | | | | This patch writes 0 (actually, what really matters is that the LSB is cleared) to the system time msr before shutting down the machine for kexec. Without it, we can have a random memory location being written when the guest comes back It overrides the functions shutdown, used in the path of kernel_kexec() (sys.c) and crash_shutdown, used in the path of crash_kexec() (kexec.c) Signed-off-by: Glauber Costa <gcosta@redhat.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
* x86: make native_machine_shutdown non-staticGlauber Costa2008-04-272-1/+2
| | | | | | | | | | it will allow external users to call it. It is mainly useful for routines that will override its machine_ops field for its own special purposes, but want to call the normal shutdown routine after they're done Signed-off-by: Glauber Costa <gcosta@redhat.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
* x86: allow machine_crash_shutdown to be replacedGlauber Costa2008-04-273-2/+13
| | | | | | | | | This patch a llows machine_crash_shutdown to be replaced, just like any of the other functions in machine_ops Signed-off-by: Glauber Costa <gcosta@redhat.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
* x86: KVM guest: hypercall batchingMarcelo Tosatti2008-04-271-2/+60
| | | | | | | | | | | Batch pte updates and tlb flushes in lazy MMU mode. [avi: - adjust to mmu_op - helper for getting para_state without debug warnings] Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
* x86: KVM guest: hypercall based pte updates and TLB flushesMarcelo Tosatti2008-04-271-0/+138
| | | | | | | | | | | | | | | | | Hypercall based pte updates are faster than faults, and also allow use of the lazy MMU mode to batch operations. Don't report the feature if two dimensional paging is enabled. [avi: - guest/host split - fix 32-bit truncation issues - adjust to mmu_op - adjust to ->release_*() renamed - add ->release_pud()] Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: MMU: hypercall based pte updates and TLB flushesMarcelo Tosatti2008-04-276-3/+190
| | | | | | | | | | | | | | | | | | Hypercall based pte updates are faster than faults, and also allow use of the lazy MMU mode to batch operations. Don't report the feature if two dimensional paging is enabled. [avi: - one mmu_op hypercall instead of one per op - allow 64-bit gpa on hypercall - don't pass host errors (-ENOMEM) to guest] [akpm: warning fix on i386] Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: Provide unlocked version of emulator_write_phys()Avi Kivity2008-04-272-7/+17
| | | | Signed-off-by: Avi Kivity <avi@qumranet.com>
* x86: KVM guest: add basic paravirt supportMarcelo Tosatti2008-04-276-0/+70
| | | | | | | Add basic KVM paravirt support. Avoid vm-exits on IO delays. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: add basic paravirt supportMarcelo Tosatti2008-04-273-1/+4
| | | | | | | Add basic KVM paravirt support. Avoid vm-exits on IO delays. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: Add reset support for in kernel PITSheng Yang2008-04-272-11/+20
| | | | | | | Separate the reset part and prepare for reset support. Signed-off-by: Sheng Yang <sheng.yang@intel.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: Add save/restore supporting of in kernel PITSheng Yang2008-04-275-0/+79
| | | | | Signed-off-by: Sheng Yang <sheng.yang@intel.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: In kernel PIT modelSheng Yang2008-04-277-1/+664
| | | | | | | | | | | | The patch moves the PIT model from userspace to kernel, and increases the timer accuracy greatly. [marcelo: make last_injected_time per-guest] Signed-off-by: Sheng Yang <sheng.yang@intel.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Tested-and-Acked-by: Alex Davis <alex14641@yahoo.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: Remove pointless desc_ptr #ifdefAvi Kivity2008-04-271-4/+0
| | | | | | The desc_struct changes left an unnecessary #ifdef; remove it. Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: VMX: Don't adjust tsc offset forwardAvi Kivity2008-04-271-3/+6
| | | | | | | | Most Intel hosts have a stable tsc, and playing with the offset only reduces accuracy. By limiting tsc offset adjustment only to forward updates, we effectively disable tsc offset adjustment on these hosts. Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: replace remaining __FUNCTION__ occurancesHarvey Harrison2008-04-276-45/+44
| | | | | | | __FUNCTION__ is gcc-specific, use __func__ Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: detect if VCPU triple faultsJoerg Roedel2008-04-272-5/+16
| | | | | | | | | | | In the current inject_page_fault path KVM only checks if there is another PF pending and injects a DF then. But it has to check for a pending DF too to detect a shutdown condition in the VCPU. If this is not detected the VCPU goes to a PF -> DF -> PF loop when it should triple fault. This patch detects this condition and handles it with an KVM_SHUTDOWN exit to userspace. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: Use kzalloc to avoid allocating kvm_regs from kernel stackXiantao Zhang2008-04-271-11/+22
| | | | | | | | Since the size of kvm_regs is too big to allocate from kernel stack on ia64, use kzalloc to allocate it. Signed-off-by: Xiantao Zhang <xiantao.zhang@intel.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: Prefix control register accessors with kvm_ to avoid namespace pollutionAvi Kivity2008-04-273-36/+36
| | | | | | Names like 'set_cr3()' look dangerously close to affecting the host. Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: MMU: large page supportMarcelo Tosatti2008-04-276-32/+262
| | | | | | | | | | | | | | Create large pages mappings if the guest PTE's are marked as such and the underlying memory is hugetlbfs backed. If the largepage contains write-protected pages, a large pte is not used. Gives a consistent 2% improvement for data copies on ram mounted filesystem, without NPT/EPT. Anthony measures a 4% improvement on 4-way kernbench, with NPT. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: MMU: ignore zapped root pagetablesMarcelo Tosatti2008-04-275-2/+48
| | | | | | | | | | | | Mark zapped root pagetables as invalid and ignore such pages during lookup. This is a problem with the cr3-target feature, where a zapped root table fools the faulting code into creating a read-only mapping. The result is a lockup if the instruction can't be emulated. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Cc: Anthony Liguori <aliguori@us.ibm.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: Implement dummy values for MSR_PERF_STATUSAlexander Graf2008-04-271-1/+7
| | | | | | | Darwin relies on this and ceases to work without. Signed-off-by: Alexander Graf <alex@csgraf.de> Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: sparse fixes for kvm/x86.cHarvey Harrison2008-04-271-13/+13
| | | | | | | | | | | | | | | | | | In two case statements, use the ever popular 'i' instead of index: arch/x86/kvm/x86.c:1063:7: warning: symbol 'index' shadows an earlier one arch/x86/kvm/x86.c:1000:9: originally declared here arch/x86/kvm/x86.c:1079:7: warning: symbol 'index' shadows an earlier one arch/x86/kvm/x86.c:1000:9: originally declared here Make it static. arch/x86/kvm/x86.c:1945:24: warning: symbol 'emulate_ops' was not declared. Should it be static? Drop the return statements. arch/x86/kvm/x86.c:2878:2: warning: returning void-valued expression arch/x86/kvm/x86.c:2944:2: warning: returning void-valued expression Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: SVM: make iopm_base staticHarvey Harrison2008-04-271-1/+1
| | | | | | | | Fixes sparse warning as well. arch/x86/kvm/svm.c:69:15: warning: symbol 'iopm_base' was not declared. Should it be static? Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: x86 emulator: fix sparse warnings in x86_emulate.cHarvey Harrison2008-04-271-2/+2
| | | | | | | | | | | | | | | Nesting __emulate_2op_nobyte inside__emulate_2op produces many shadowed variable warnings on the internal variable _tmp used by both macros. Change the outer macro to use __tmp. Avoids a sparse warning like the following at every call site of __emulate_2op arch/x86/kvm/x86_emulate.c:1091:3: warning: symbol '_tmp' shadows an earlier one arch/x86/kvm/x86_emulate.c:1091:3: originally declared here [18 more warnings suppressed] Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: Add stat counter for hypercallsAmit Shah2008-04-272-0/+3
| | | | | Signed-off-by: Amit Shah <amit.shah@qumranet.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: Use x86's segment descriptor struct instead of private definitionAvi Kivity2008-04-273-39/+8
| | | | | | The x86 desc_struct unification allows us to remove segment_descriptor.h. Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: Increase the number of user memory slots per vmAvi Kivity2008-04-271-1/+1
| | | | Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: Add API for determining the number of supported memory slotsAvi Kivity2008-04-272-0/+4
| | | | Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: Increase vcpu count to 16Avi Kivity2008-04-271-1/+1
| | | | | | With NPT support, scalability is much improved. Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: Add API to retrieve the number of supported vcpus per vmAvi Kivity2008-04-272-0/+4
| | | | Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: x86 emulator: make register_address_increment and JMP_REL static inlinesHarvey Harrison2008-04-271-30/+26
| | | | | | | Change jmp_rel() to a function as well. Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: x86 emulator: make register_address, address_mask static inlinesHarvey Harrison2008-04-271-19/+29
| | | | | Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: x86 emulator: add ad_mask static inlineHarvey Harrison2008-04-271-3/+8
| | | | | | | Replaces open-coded mask calculation in macros. Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
* x86: KVM guest: paravirtualized clocksourceGlauber de Oliveira Costa2008-04-275-0/+182
| | | | | | | | | | | | | | | | | | This is the guest part of kvm clock implementation It does not do tsc-only timing, as tsc can have deltas between cpus, and it did not seem worthy to me to keep adjusting them. We do use it, however, for fine-grained adjustment. Other than that, time comes from the host. [randy dunlap: add missing include] [randy dunlap: disallow on Voyager or Visual WS] Signed-off-by: Glauber de Oliveira Costa <gcosta@redhat.com> Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: paravirtualized clocksource: host partGlauber de Oliveira Costa2008-04-274-1/+145
| | | | | | | | | | | | | | | | | This is the host part of kvm clocksource implementation. As it does not include clockevents, it is a fairly simple implementation. We only have to register a per-vcpu area, and start writing to it periodically. The area is binary compatible with xen, as we use the same shadow_info structure. [marcelo: fix bad_page on MSR_KVM_SYSTEM_TIME] [avi: save full value of the msr, even if enable bit is clear] [avi: clear previous value of time_page] Signed-off-by: Glauber de Oliveira Costa <gcosta@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: SVM: enable LBR virtualizationJoerg Roedel2008-04-271-2/+37
| | | | | | | | | | | | This patch implements the Last Branch Record Virtualization (LBRV) feature of the AMD Barcelona and Phenom processors into the kvm-amd module. It will only be enabled if the guest enables last branch recording in the DEBUG_CTL MSR. So there is no increased world switch overhead when the guest doesn't use these MSRs. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Markus Rechberger <markus.rechberger@amd.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: SVM: allocate the MSR permission map per VCPUJoerg Roedel2008-04-272-35/+34
| | | | | | | | | | | This patch changes the kvm-amd module to allocate the SVM MSR permission map per VCPU instead of a global map for all VCPUs. With this we have more flexibility allowing specific guests to access virtualized MSRs. This is required for LBR virtualization. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Markus Rechberger <markus.rechberger@amd.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: SVM: let init_vmcb() take struct vcpu_svm as parameterJoerg Roedel2008-04-271-6/+6
| | | | | | | | | Change the parameter of the init_vmcb() function in the kvm-amd module from struct vmcb to struct vcpu_svm. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Markus Rechberger <markus.rechberger@amd.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: VMX: fix typo in VMX header defineRyan Harper2008-04-272-5/+5
| | | | | | | | | Looking at Intel Volume 3b, page 148, table 20-11 and noticed that the field name is 'Deliver' not 'Deliever'. Attached patch changes the define name and its user in vmx.c Signed-off-by: Ryan Harper <ryanh@us.ibm.com> Signed-off-by: Avi Kivity <avi@qumranet.com>