summaryrefslogtreecommitdiffstats
path: root/arch/arm/include/asm/kvm_host.h
Commit message (Collapse)AuthorAgeFilesLines
* KVM: arm/arm64: Move cntvoff to each timer contextJintack Lim2017-02-081-3/+0
| | | | | | | | | | | | | Make cntvoff per each timer context. This is helpful to abstract kvm timer functions to work with timer context without considering timer types (e.g. physical timer or virtual timer). This also would pave the way for ever doing adjustments of the cntvoff on a per-CPU basis if that should ever make sense. Signed-off-by: Jintack Lim <jintack@cs.columbia.edu> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
* arm/arm64: KVM: Perform local TLB invalidation when multiplexing vcpus on a ↵Marc Zyngier2016-11-041-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | single CPU Architecturally, TLBs are private to the (physical) CPU they're associated with. But when multiple vcpus from the same VM are being multiplexed on the same CPU, the TLBs are not private to the vcpus (and are actually shared across the VMID). Let's consider the following scenario: - vcpu-0 maps PA to VA - vcpu-1 maps PA' to VA If run on the same physical CPU, vcpu-1 can hit TLB entries generated by vcpu-0 accesses, and access the wrong physical page. The solution to this is to keep a per-VM map of which vcpu ran last on each given physical CPU, and invalidate local TLBs when switching to a different vcpu from the same VM. Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
* Merge tag 'kvm-arm-for-v4.9' of ↵Radim Krčmář2016-09-291-0/+5
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into next KVM/ARM Changes for v4.9 - Various cleanups and removal of redundant code - Two important fixes for not using an in-kernel irqchip - A bit of optimizations - Handle SError exceptions and present them to guests if appropriate - Proxying of GICV access at EL2 if guest mappings are unsafe - GICv3 on AArch32 on ARMv8 - Preparations for GICv3 save/restore, including ABI docs
| * ARM: KVM: Support vgic-v3Vladimir Murzin2016-09-221-0/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch allows to build and use vgic-v3 in 32-bit mode. Unfortunately, it can not be split in several steps without extra stubs to keep patches independent and bisectable. For instance, virt/kvm/arm/vgic/vgic-v3.c uses function from vgic-v3-sr.c, handling access to GICv3 cpu interface from the guest requires vgic_v3.vgic_sre to be already defined. It is how support has been done: * handle SGI requests from the guest * report configured SRE on access to GICv3 cpu interface from the guest * required vgic-v3 macros are provided via uapi.h * static keys are used to select GIC backend * to make vgic-v3 build KVM_ARM_VGIC_V3 guard is removed along with the static inlines Acked-by: Marc Zyngier <marc.zyngier@arm.com> Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Vladimir Murzin <vladimir.murzin@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
* | KVM: Add provisioning for ulong vm stats and u64 vcpu statsSuraj Jitindar Singh2016-09-081-6/+6
|/ | | | | | | | | | | | | | | | | | | | | | vms and vcpus have statistics associated with them which can be viewed within the debugfs. Currently it is assumed within the vcpu_stat_get() and vm_stat_get() functions that all of these statistics are represented as u32s, however the next patch adds some u64 vcpu statistics. Change all vcpu statistics to u64 and modify vcpu_stat_get() accordingly. Since vcpu statistics are per vcpu, they will only be updated by a single vcpu at a time so this shouldn't present a problem on 32-bit machines which can't atomically increment 64-bit numbers. However vm statistics could potentially be updated by multiple vcpus from that vm at a time. To avoid the overhead of atomics make all vm statistics ulong such that they are 64-bit on 64-bit systems where they can be atomically incremented and are 32-bit on 32-bit systems which may not be able to atomically increment 64-bit numbers. Modify vm_stat_get() to expect ulongs. Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com> Reviewed-by: David Matlack <dmatlack@google.com> Acked-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
* KVM: arm/arm64: Extend arch CAP checks to allow per-VM capabilitiesAndre Przywara2016-07-181-1/+1
| | | | | | | | | | | | | | | KVM capabilities can be a per-VM property, though ARM/ARM64 currently does not pass on the VM pointer to the architecture specific capability handlers. Add a "struct kvm*" parameter to those function to later allow proper per-VM capability reporting. Signed-off-by: Andre Przywara <andre.przywara@arm.com> Reviewed-by: Eric Auger <eric.auger@linaro.org> Reviewed-by: Marc Zyngier <marc.zyngier@arm.com> Acked-by: Christoffer Dall <christoffer.dall@linaro.org> Tested-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
* arm: KVM: Allow hyp teardownMarc Zyngier2016-07-031-5/+3
| | | | | | | | | | | So far, KVM was getting in the way of kexec on 32bit (and the arm64 kexec hackers couldn't be bothered to fix it on 32bit...). With simpler page tables, tearing KVM down becomes very easy, so let's just do it. Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
* arm: KVM: Simplify HYP initMarc Zyngier2016-07-031-10/+5
| | | | | | | | | Just like for arm64, we can now make the HYP setup a lot simpler, and we can now initialise it in one go (instead of the two phases we currently have). Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
* arm/arm64: KVM: Drop boot_pgdMarc Zyngier2016-07-031-5/+3
| | | | | | | | | Since we now only have one set of page tables, the concept of boot_pgd is useless and can be removed. We still keep it as an element of the "extended idmap" thing. Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
* Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds2016-05-271-0/+6
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull second batch of KVM updates from Radim Krčmář: "General: - move kvm_stat tool from QEMU repo into tools/kvm/kvm_stat (kvm_stat had nothing to do with QEMU in the first place -- the tool only interprets debugfs) - expose per-vm statistics in debugfs and support them in kvm_stat (KVM always collected per-vm statistics, but they were summarised into global statistics) x86: - fix dynamic APICv (VMX was improperly configured and a guest could access host's APIC MSRs, CVE-2016-4440) - minor fixes ARM changes from Christoffer Dall: - new vgic reimplementation of our horribly broken legacy vgic implementation. The two implementations will live side-by-side (with the new being the configured default) for one kernel release and then we'll remove the legacy one. - fix for a non-critical issue with virtual abort injection to guests" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (70 commits) tools: kvm_stat: Add comments tools: kvm_stat: Introduce pid monitoring KVM: Create debugfs dir and stat files for each VM MAINTAINERS: Add kvm tools tools: kvm_stat: Powerpc related fixes tools: Add kvm_stat man page tools: Add kvm_stat vm monitor script kvm:vmx: more complete state update on APICv on/off KVM: SVM: Add more SVM_EXIT_REASONS KVM: Unify traced vector format svm: bitwise vs logical op typo KVM: arm/arm64: vgic-new: Synchronize changes to active state KVM: arm/arm64: vgic-new: enable build KVM: arm/arm64: vgic-new: implement mapped IRQ handling KVM: arm/arm64: vgic-new: Wire up irqfd injection KVM: arm/arm64: vgic-new: Add vgic_v2/v3_enable KVM: arm/arm64: vgic-new: vgic_init: implement map_resources KVM: arm/arm64: vgic-new: vgic_init: implement vgic_init KVM: arm/arm64: vgic-new: vgic_init: implement vgic_create KVM: arm/arm64: vgic-new: vgic_init: implement kvm_vgic_hyp_init ...
| * Merge tag 'kvm-arm-for-4-7-take2' of ↵Paolo Bonzini2016-05-241-0/+6
| |\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into kvm-next KVM/ARM Changes for v4.7 take 2 "The GIC is dead; Long live the GIC" This set of changes include the new vgic, which is a reimplementation of our horribly broken legacy vgic implementation. The two implementations will live side-by-side (with the new being the configured default) for one kernel release and then we'll remove it. Also fixes a non-critical issue with virtual abort injection to guests.
| | * KVM: arm/arm64: vgic-new: Synchronize changes to active stateChristoffer Dall2016-05-201-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When modifying the active state of an interrupt via the MMIO interface, we should ensure that the write has the intended effect. If a guest sets an interrupt to active, but that interrupt is already flushed into a list register on a running VCPU, then that VCPU will write the active state back into the struct vgic_irq upon returning from the guest and syncing its state. This is a non-benign race, because the guest can observe that an interrupt is not active, and it can have a reasonable expectations that other VCPUs will not ack any IRQs, and then set the state to active, and expect it to stay that way. Currently we are not honoring this case. Thefore, change both the SACTIVE and CACTIVE mmio handlers to stop the world, change the irq state, potentially queue the irq if we're setting it to active, and then continue. We take this chance to slightly optimize these functions by not stopping the world when touching private interrupts where there is inherently no possible race. Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
| | * KVM: arm/arm64: Provide functionality to pause and resume a guestChristoffer Dall2016-05-201-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | For some rare corner cases in our VGIC emulation later we have to stop the guest to make sure the VGIC state is consistent. Provide the necessary framework to pause and resume a guest. Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Andre Przywara <andre.przywara@arm.com>
* | | Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds2016-05-191-0/+2
|\| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull KVM updates from Paolo Bonzini: "Small release overall. x86: - miscellaneous fixes - AVIC support (local APIC virtualization, AMD version) s390: - polling for interrupts after a VCPU goes to halted state is now enabled for s390 - use hardware provided information about facility bits that do not need any hypervisor activity, and other fixes for cpu models and facilities - improve perf output - floating interrupt controller improvements. MIPS: - miscellaneous fixes PPC: - bugfixes only ARM: - 16K page size support - generic firmware probing layer for timer and GIC Christoffer Dall (KVM-ARM maintainer) says: "There are a few changes in this pull request touching things outside KVM, but they should all carry the necessary acks and it made the merge process much easier to do it this way." though actually the irqchip maintainers' acks didn't make it into the patches. Marc Zyngier, who is both irqchip and KVM-ARM maintainer, later acked at http://mid.gmane.org/573351D1.4060303@arm.com ('more formally and for documentation purposes')" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (82 commits) KVM: MTRR: remove MSR 0x2f8 KVM: x86: make hwapic_isr_update and hwapic_irr_update look the same svm: Manage vcpu load/unload when enable AVIC svm: Do not intercept CR8 when enable AVIC svm: Do not expose x2APIC when enable AVIC KVM: x86: Introducing kvm_x86_ops.apicv_post_state_restore svm: Add VMEXIT handlers for AVIC svm: Add interrupt injection via AVIC KVM: x86: Detect and Initialize AVIC support svm: Introduce new AVIC VMCB registers KVM: split kvm_vcpu_wake_up from kvm_vcpu_kick KVM: x86: Introducing kvm_x86_ops VCPU blocking/unblocking hooks KVM: x86: Introducing kvm_x86_ops VM init/destroy hooks KVM: x86: Rename kvm_apic_get_reg to kvm_lapic_get_reg KVM: x86: Misc LAPIC changes to expose helper functions KVM: shrink halt polling even more for invalid wakeups KVM: s390: set halt polling to 80 microseconds KVM: halt_polling: provide a way to qualify wakeups during poll KVM: PPC: Book3S HV: Re-enable XICS fast path for irqfd-generated interrupts kvm: Conditionally register IRQ bypass consumer ...
| * | KVM: halt_polling: provide a way to qualify wakeups during pollChristian Borntraeger2016-05-131-0/+2
| |/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Some wakeups should not be considered a sucessful poll. For example on s390 I/O interrupts are usually floating, which means that _ALL_ CPUs would be considered runnable - letting all vCPUs poll all the time for transactional like workload, even if one vCPU would be enough. This can result in huge CPU usage for large guests. This patch lets architectures provide a way to qualify wakeups if they should be considered a good/bad wakeups in regard to polls. For s390 the implementation will fence of halt polling for anything but known good, single vCPU events. The s390 implementation for floating interrupts does a wakeup for one vCPU, but the interrupt will be delivered by whatever CPU checks first for a pending interrupt. We prefer the woken up CPU by marking the poll of this CPU as "good" poll. This code will also mark several other wakeup reasons like IPI or expired timers as "good". This will of course also mark some events as not sucessful. As KVM on z runs always as a 2nd level hypervisor, we prefer to not poll, unless we are really sure, though. This patch successfully limits the CPU usage for cases like uperf 1byte transactional ping pong workload or wakeup heavy workload like OLTP while still providing a proper speedup. This also introduced a new vcpu stat "halt_poll_no_tuning" that marks wakeups that are considered not good for polling. Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Acked-by: Radim Krčmář <rkrcmar@redhat.com> (for an earlier version) Cc: David Matlack <dmatlack@google.com> Cc: Wanpeng Li <kernellwp@gmail.com> [Rename config symbol. - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* / arm64: kvm: allows kvm cpu hotplugAKASHI Takahiro2016-04-281-1/+9
|/ | | | | | | | | | | | | | | | | | | | | | | | The current kvm implementation on arm64 does cpu-specific initialization at system boot, and has no way to gracefully shutdown a core in terms of kvm. This prevents kexec from rebooting the system at EL2. This patch adds a cpu tear-down function and also puts an existing cpu-init code into a separate function, kvm_arch_hardware_disable() and kvm_arch_hardware_enable() respectively. We don't need the arm64 specific cpu hotplug hook any more. Since this patch modifies common code between arm and arm64, one stub definition, __cpu_reset_hyp_mode(), is added on arm side to avoid compilation errors. Signed-off-by: AKASHI Takahiro <takahiro.akashi@linaro.org> [Rebase, added separate VHE init/exit path, changed resets use of kvm_call_hyp() to the __version, en/disabled hardware in init_subsystems(), added icache maintenance to __kvm_hyp_reset() and removed lr restore, removed guest-enter after teardown handling] Signed-off-by: James Morse <james.morse@arm.com> Acked-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
* arm64: KVM: Add a new vcpu device control group for PMUv3Shannon Zhao2016-02-291-0/+15
| | | | | | | | | | | | | | | To configure the virtual PMUv3 overflow interrupt number, we use the vcpu kvm_device ioctl, encapsulating the KVM_ARM_VCPU_PMU_V3_IRQ attribute within the KVM_ARM_VCPU_PMU_V3_CTRL group. After configuring the PMUv3, call the vcpu ioctl with attribute KVM_ARM_VCPU_PMU_V3_INIT to initialize the PMUv3. Signed-off-by: Shannon Zhao <shannon.zhao@linaro.org> Acked-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Andrew Jones <drjones@redhat.com> Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
* ARM: KVM: Remove unused hyp_pc fieldMarc Zyngier2016-02-291-1/+0
| | | | | | | | This field was never populated, and the panic code already does something similar. Delete the related code. Acked-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
* ARM: KVM: Turn CP15 defines to an enumMarc Zyngier2016-02-291-0/+39
| | | | | | | | | Just like on arm64, having the CP15 registers expressed as a set of #defines has been very conflict-prone. Let's turn it into an enum, which should make it more manageable. Acked-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
* ARM: KVM: Switch to C-based stage2 initMarc Zyngier2016-02-291-0/+1
| | | | | | | | As we now have hooks to setup VTCR from C code, let's drop the original VTCR setup and reimplement it as part of the HYP code. Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
* ARM: KVM: Change kvm_call_hyp return type to unsigned longMarc Zyngier2016-02-291-1/+1
| | | | | | | | | | | | | | | | Having u64 as the kvm_call_hyp return type is problematic, as it forces all kind of tricks for the return values from HYP to be promoted to 64bit (LE has the LSB in r0, and BE has them in r1). Since the only user of the return value is perfectly happy with a 32bit value, let's make kvm_call_hyp return an unsigned long, which is 32bit on ARM. This solves yet another headache. Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
* ARM: KVM: Move GP registers into the CPU context structureMarc Zyngier2016-02-291-2/+1
| | | | | | | | Continuing our rework of the CPU context, we now move the GP registers into the CPU context structure. Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
* ARM: KVM: Move CP15 array into the CPU context structureMarc Zyngier2016-02-291-3/+3
| | | | | | | | | | | Continuing our rework of the CPU context, we now move the CP15 array into the CPU context structure. As this causes quite a bit of churn, we introduce the vcpu_cp15() macro that abstract the location of the actual array. This will probably help next time we have to revisit that code. Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
* ARM: KVM: Move VFP registers to a CPU context structureMarc Zyngier2016-02-291-4/+7
| | | | | | | | | In order to turn the WS code into something that looks a bit more like the arm64 version, move the VFP registers into a CPU context container for both the host and the guest. Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
* arm/arm64: KVM: Add hook for C-based stage2 initMarc Zyngier2016-02-291-0/+4
| | | | | | | | As we're about to move the stage2 init to C code, introduce some C hooks that will later be populated with arch-specific implementations. Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
* KVM: arm/arm64: Count guest exit due to various reasonsAmit Tomar2015-12-141-0/+6
| | | | | | | | | | It would add guest exit statistics to debugfs, this can be helpful while measuring KVM performance. [ Renamed some of the field names - Christoffer ] Signed-off-by: Amit Singh Tomar <amittomer25@gmail.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
* KVM: arm/arm64: implement kvm_arm_[halt,resume]_guestEric Auger2015-10-221-0/+3
| | | | | | | | | | | | | | We introduce kvm_arm_halt_guest and resume functions. They will be used for IRQ forward state change. Halt is synchronous and prevents the guest from being re-entered. We use the same mechanism put in place for PSCI former pause, now renamed power_off. A new flag is introduced in arch vcpu state, pause, only meant to be used by those functions. Signed-off-by: Eric Auger <eric.auger@linaro.org> Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
* KVM: arm/arm64: rename pause into power_offEric Auger2015-10-221-2/+2
| | | | | | | | | | | The kvm_vcpu_arch pause field is renamed into power_off to prepare for the introduction of a new pause field. Also vcpu_pause is renamed into vcpu_sleep since we will sleep until both power_off and pause are false. Signed-off-by: Eric Auger <eric.auger@linaro.org> Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
* arm/arm64: KVM: arch_timer: Only schedule soft timer on vcpu_blockChristoffer Dall2015-10-221-3/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We currently schedule a soft timer every time we exit the guest if the timer did not expire while running the guest. This is really not necessary, because the only work we do in the timer work function is to kick the vcpu. Kicking the vcpu does two things: (1) If the vpcu thread is on a waitqueue, make it runnable and remove it from the waitqueue. (2) If the vcpu is running on a different physical CPU from the one doing the kick, it sends a reschedule IPI. The second case cannot happen, because the soft timer is only ever scheduled when the vcpu is not running. The first case is only relevant when the vcpu thread is on a waitqueue, which is only the case when the vcpu thread has called kvm_vcpu_block(). Therefore, we only need to make sure a timer is scheduled for kvm_vcpu_block(), which we do by encapsulating all calls to kvm_vcpu_block() with kvm_timer_{un}schedule calls. Additionally, we only schedule a soft timer if the timer is enabled and unmasked, since it is useless otherwise. Note that theoretically userspace can use the SET_ONE_REG interface to change registers that should cause the timer to fire, even if the vcpu is blocked without a scheduled timer, but this case was not supported before this patch and we leave it for future work for now. Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
* KVM: Add kvm_arch_vcpu_{un}blocking callbacksChristoffer Dall2015-10-221-0/+3
| | | | | | | | | | | | | Some times it is useful for architecture implementations of KVM to know when the VCPU thread is about to block or when it comes back from blocking (arm/arm64 needs to know this to properly implement timers, for example). Therefore provide a generic architecture callback function in line with what we do elsewhere for KVM generic-arch interactions. Reviewed-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
* KVM: disable halt_poll_ns as default for s390xDavid Hildenbrand2015-09-251-0/+1
| | | | | | | | | | | | We observed some performance degradation on s390x with dynamic halt polling. Until we can provide a proper fix, let's enable halt_poll_ns as default only for supported architectures. Architectures are now free to set their own halt_poll_ns default value. Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* Merge tag 'kvm-arm-for-4.3-rc2-2' of ↵Paolo Bonzini2015-09-171-6/+2
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into kvm-master Second set of KVM/ARM changes for 4.3-rc2 - Workaround for a Cortex-A57 erratum - Bug fix for the debugging infrastructure - Fix for 32bit guests with more than 4GB of address space on a 32bit host - A number of fixes for the (unusual) case when we don't use the in-kernel GIC emulation - Removal of ThumbEE handling on arm64, since these have been dropped from the architecture before anyone actually ever built a CPU - Remove the KVM_ARM_MAX_VCPUS limitation which has become fairly pointless
| * arm/arm64: KVM: Remove 'config KVM_ARM_MAX_VCPUS'Ming Lei2015-09-171-6/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch removes config option of KVM_ARM_MAX_VCPUS, and like other ARCHs, just choose the maximum allowed value from hardware, and follows the reasons: 1) from distribution view, the option has to be defined as the max allowed value because it need to meet all kinds of virtulization applications and need to support most of SoCs; 2) using a bigger value doesn't introduce extra memory consumption, and the help text in Kconfig isn't accurate because kvm_vpu structure isn't allocated until request of creating VCPU is sent from QEMU; 3) the main effect is that the field of vcpus[] in 'struct kvm' becomes a bit bigger(sizeof(void *) per vcpu) and need more cache lines to hold the structure, but 'struct kvm' is one generic struct, and it has worked well on other ARCHs already in this way. Also, the world switch frequecy is often low, for example, it is ~2000 when running kernel building load in VM from APM xgene KVM host, so the effect is very small, and the difference can't be observed in my test at all. Cc: Dann Frazier <dann.frazier@canonical.com> Signed-off-by: Ming Lei <ming.lei@canonical.com> Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
* | KVM: add halt_attempted_poll to VCPU statsPaolo Bonzini2015-09-161-0/+1
|/ | | | | | | | | | | | | | | | | This new statistic can help diagnosing VCPUs that, for any reason, trigger bad behavior of halt_poll_ns autotuning. For example, say halt_poll_ns = 480000, and wakeups are spaced exactly like 479us, 481us, 479us, 481us. Then KVM always fails polling and wastes 10+20+40+80+160+320+480 = 1110 microseconds out of every 479+481+479+481+479+481+479 = 3359 microseconds. The VCPU then is consuming about 30% more CPU than it would use without polling. This would show as an abnormally high number of attempted polling compared to the successful polls. Acked-by: Christian Borntraeger <borntraeger@de.ibm.com< Reviewed-by: David Matlack <dmatlack@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* KVM: arm64: introduce vcpu->arch.debug_ptrAlex Bennée2015-07-211-0/+1
| | | | | | | | | | | | | | | | | | This introduces a level of indirection for the debug registers. Instead of using the sys_regs[] directly we store registers in a structure in the vcpu. The new kvm_arm_reset_debug_ptr() sets the debug ptr to the guest context. Because we no longer give the sys_regs offset for the sys_reg_desc->reg field, but instead the index into a debug-specific struct we need to add a number of additional trap functions for each register. Also as the generic generic user-space access code no longer works we have introduced a new pair of function pointers to the sys_reg_desc structure to override the generic code when needed. Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
* KVM: arm: introduce kvm_arm_init/setup/clear_debugAlex Bennée2015-07-211-0/+4
| | | | | | | | | | | | | | | | | | | | | This is a precursor for later patches which will need to do more to setup debug state before entering the hyp.S switch code. The existing functionality for setting mdcr_el2 has been moved out of hyp.S and now uses the value kept in vcpu->arch.mdcr_el2. As the assembler used to previously mask and preserve MDCR_EL2.HPMN I've had to add a mechanism to save the value of mdcr_el2 as a per-cpu variable during the initialisation code. The kernel never sets this number so we are assuming the bootcode has set up the correct value here. This also moves the conditional setting of the TDA bit from the hyp code into the C code which is currently used for the lazy debug register context switch code. Signed-off-by: Alex Bennée <alex.bennee@linaro.org> Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
* arm64: KVM: Switch vgic save/restore to alternative_insnMarc Zyngier2015-06-121-5/+0
| | | | | | | | | | | | | | | | So far, we configured the world-switch by having a small array of pointers to the save and restore functions, depending on the GIC used on the platform. Loading these values each time is a bit silly (they never change), and it makes sense to rely on the instruction patching instead. This leads to a nice cleanup of the code. Acked-by: Will Deacon <will.deacon@arm.com> Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
* arm/arm64: KVM: Implement Stage-2 page agingMarc Zyngier2015-03-121-11/+2
| | | | | | | | | | | | | | | | | | | | Until now, KVM/arm didn't care much for page aging (who was swapping anyway?), and simply provided empty hooks to the core KVM code. With server-type systems now being available, things are quite different. This patch implements very simple support for page aging, by clearing the Access flag in the Stage-2 page tables. On access fault, the current fault handling will write the PTE or PMD again, putting the Access flag back on. It should be possible to implement a much faster handling for Access faults, but that's left for a later patch. With this in place, performance in VMs is degraded much more gracefully. Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Acked-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
* KVM: arm/arm64: implement kvm_arch_intc_initializedEric Auger2015-03-121-0/+2
| | | | | | | | | | | | | | On arm/arm64 the VGIC is dynamically instantiated and it is useful to expose its state, especially for irqfd setup. This patch defines __KVM_HAVE_ARCH_INTC_INITIALIZED and implements kvm_arch_intc_initialized. Signed-off-by: Eric Auger <eric.auger@linaro.org> Acked-by: Christoffer Dall <christoffer.dall@linaro.org> Reviewed-by: Andre Przywara <andre.przywara@arm.com> Acked-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
* Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds2015-02-131-0/+6
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull KVM update from Paolo Bonzini: "Fairly small update, but there are some interesting new features. Common: Optional support for adding a small amount of polling on each HLT instruction executed in the guest (or equivalent for other architectures). This can improve latency up to 50% on some scenarios (e.g. O_DSYNC writes or TCP_RR netperf tests). This also has to be enabled manually for now, but the plan is to auto-tune this in the future. ARM/ARM64: The highlights are support for GICv3 emulation and dirty page tracking s390: Several optimizations and bugfixes. Also a first: a feature exposed by KVM (UUID and long guest name in /proc/sysinfo) before it is available in IBM's hypervisor! :) MIPS: Bugfixes. x86: Support for PML (page modification logging, a new feature in Broadwell Xeons that speeds up dirty page tracking), nested virtualization improvements (nested APICv---a nice optimization), usual round of emulation fixes. There is also a new option to reduce latency of the TSC deadline timer in the guest; this needs to be tuned manually. Some commits are common between this pull and Catalin's; I see you have already included his tree. Powerpc: Nothing yet. The KVM/PPC changes will come in through the PPC maintainers, because I haven't received them yet and I might end up being offline for some part of next week" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (130 commits) KVM: ia64: drop kvm.h from installed user headers KVM: x86: fix build with !CONFIG_SMP KVM: x86: emulate: correct page fault error code for NoWrite instructions KVM: Disable compat ioctl for s390 KVM: s390: add cpu model support KVM: s390: use facilities and cpu_id per KVM KVM: s390/CPACF: Choose crypto control block format s390/kernel: Update /proc/sysinfo file with Extended Name and UUID KVM: s390: reenable LPP facility KVM: s390: floating irqs: fix user triggerable endless loop kvm: add halt_poll_ns module parameter kvm: remove KVM_MMIO_SIZE KVM: MIPS: Don't leak FPU/DSP to guest KVM: MIPS: Disable HTW while in guest KVM: nVMX: Enable nested posted interrupt processing KVM: nVMX: Enable nested virtual interrupt delivery KVM: nVMX: Enable nested apic register virtualization KVM: nVMX: Make nested control MSRs per-cpu KVM: nVMX: Enable nested virtualize x2apic mode KVM: nVMX: Prepare for using hardware MSR bitmap ...
| * kvm: add halt_poll_ns module parameterPaolo Bonzini2015-02-061-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch introduces a new module parameter for the KVM module; when it is present, KVM attempts a bit of polling on every HLT before scheduling itself out via kvm_vcpu_block. This parameter helps a lot for latency-bound workloads---in particular I tested it with O_DSYNC writes with a battery-backed disk in the host. In this case, writes are fast (because the data doesn't have to go all the way to the platters) but they cannot be merged by either the host or the guest. KVM's performance here is usually around 30% of bare metal, or 50% if you use cache=directsync or cache=writethrough (these parameters avoid that the guest sends pointless flush requests, and at the same time they are not slow because of the battery-backed cache). The bad performance happens because on every halt the host CPU decides to halt itself too. When the interrupt comes, the vCPU thread is then migrated to a new physical CPU, and in general the latency is horrible because the vCPU thread has to be scheduled back in. With this patch performance reaches 60-65% of bare metal and, more important, 99% of what you get if you use idle=poll in the guest. This means that the tunable gets rid of this particular bottleneck, and more work can be done to improve performance in the kernel or QEMU. Of course there is some price to pay; every time an otherwise idle vCPUs is interrupted by an interrupt, it will poll unnecessarily and thus impose a little load on the host. The above results were obtained with a mostly random value of the parameter (500000), and the load was around 1.5-2.5% CPU usage on one of the host's core for each idle guest vCPU. The patch also adds a new stat, /sys/kernel/debug/kvm/halt_successful_poll, that can be used to tune the parameter. It counts how many HLT instructions received an interrupt during the polling period; each successful poll avoids that Linux schedules the VCPU thread out and back in, and may also avoid a likely trip to C1 and back for the physical CPU. While the VM is idle, a Linux 4 VCPU VM halts around 10 times per second. Of these halts, almost all are failed polls. During the benchmark, instead, basically all halts end within the polling period, except a more or less constant stream of 50 per second coming from vCPUs that are not running the benchmark. The wasted time is thus very low. Things may be slightly different for Windows VMs, which have a ~10 ms timer tick. The effect is also visible on Marcelo's recently-introduced latency test for the TSC deadline timer. Though of course a non-RT kernel has awful latency bounds, the latency of the timer is around 8000-10000 clock cycles compared to 20000-120000 without setting halt_poll_ns. For the TSC deadline timer, thus, the effect is both a smaller average latency and a smaller variance. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
| * arm/arm64: KVM: make the maximum number of vCPUs a per-VM valueAndre Przywara2015-01-201-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | Currently the maximum number of vCPUs supported is a global value limited by the used GIC model. GICv3 will lift this limit, but we still need to observe it for guests using GICv2. So the maximum number of vCPUs is per-VM value, depending on the GIC model the guest uses. Store and check the value in struct kvm_arch, but keep it down to 8 for now. Signed-off-by: Andre Przywara <andre.przywara@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
| * arm/arm64: KVM: rework MPIDR assignment and add accessorsAndre Przywara2015-01-201-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The virtual MPIDR registers (containing topology information) for the guest are currently mapped linearily to the vcpu_id. Improve this mapping for arm64 by using three levels to not artificially limit the number of vCPUs. To help this, change and rename the kvm_vcpu_get_mpidr() function to mask off the non-affinity bits in the MPIDR register. Also add an accessor to later allow easier access to a vCPU with a given MPIDR. Use this new accessor in the PSCI emulation. Signed-off-by: Andre Przywara <andre.przywara@arm.com> Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org> Reviewed-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
| * KVM: arm/arm64: Enable Dirty Page logging for ARMv8Mario Smarduch2015-01-161-12/+0
| | | | | | | | | | | | | | | | | | This patch enables ARMv8 ditry page logging support. Plugs ARMv8 into generic layer through Kconfig symbol, and drops earlier ARM64 constraints to enable logging at architecture layer. Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Mario Smarduch <m.smarduch@samsung.com>
| * KVM: arm: Add initial dirty page locking supportMario Smarduch2015-01-161-0/+2
| | | | | | | | | | | | | | | | | | Add support for initial write protection of VM memslots. This patch series assumes that huge PUDs will not be used in 2nd stage tables, which is always valid on ARMv7 Acked-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Mario Smarduch <m.smarduch@samsung.com>
| * KVM: arm: Add ARMv7 API to flush TLBsMario Smarduch2015-01-161-0/+12
| | | | | | | | | | | | | | | | This patch adds ARMv7 architecture TLB Flush function. Acked-by: Marc Zyngier <marc.zyngier@arm.com> Acked-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Mario Smarduch <m.smarduch@samsung.com>
* | arm/arm64: KVM: Use set/way op trapping to track the state of the cachesMarc Zyngier2015-01-291-3/+0
|/ | | | | | | | | | | | | | | | | | Trying to emulate the behaviour of set/way cache ops is fairly pointless, as there are too many ways we can end-up missing stuff. Also, there is some system caches out there that simply ignore set/way operations. So instead of trying to implement them, let's convert it to VA ops, and use them as a way to re-enable the trapping of VM ops. That way, we can detect the point when the MMU/caches are turned off, and do a full VM flush (which is what the guest was trying to do anyway). This allows a 32bit zImage to boot on the APM thingy, and will probably help bootloaders in general. Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
* arm/arm64: KVM: Clarify KVM_ARM_VCPU_INIT ABIChristoffer Dall2014-12-131-2/+0
| | | | | | | | | | | | | | | | | It is not clear that this ioctl can be called multiple times for a given vcpu. Userspace already does this, so clarify the ABI. Also specify that userspace is expected to always make secondary and subsequent calls to the ioctl with the same parameters for the VCPU as the initial call (which userspace also already does). Add code to check that userspace doesn't violate that ABI in the future, and move the kvm_vcpu_set_target() function which is currently duplicated between the 32-bit and 64-bit versions in guest.c to a common static function in arm.c, shared between both architectures. Acked-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
* Merge tag 'kvm-arm-for-3.18' of ↵Paolo Bonzini2014-09-271-1/+1
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into kvm-next Changes for KVM for arm/arm64 for 3.18 This includes a bunch of changes: - Support read-only memory slots on arm/arm64 - Various changes to fix Sparse warnings - Correctly detect write vs. read Stage-2 faults - Various VGIC cleanups and fixes - Dynamic VGIC data strcuture sizing - Fix SGI set_clear_pend offset bug - Fix VTTBR_BADDR Mask - Correctly report the FSC on Stage-2 faults Conflicts: virt/kvm/eventfd.c [duplicate, different patch where the kvm-arm version broke x86. The kvm tree instead has the right one]
| * Merge remote-tracking branch 'kvm/next' into queueChristoffer Dall2014-09-181-5/+8
| |\ | | | | | | | | | | | | | | | Conflicts: arch/arm64/include/asm/kvm_host.h virt/kvm/arm/vgic.c