summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* kvm: x86: remove efer_reload entry in kvm_vcpu_statLongpeng(Mike)2018-01-312-2/+0
| | | | | | | | | The efer_reload is never used since commit 26bb0981b3ff ("KVM: VMX: Use shared msr infrastructure"), so remove it. Signed-off-by: Longpeng(Mike) <longpeng2@huawei.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
* KVM: x86: AMD Processor Topology InformationStanislav Lanci2018-01-311-1/+2
| | | | | | | | | | | This patch allow to enable x86 feature TOPOEXT. This is needed to provide information about SMT on AMD Zen CPUs to the guest. Signed-off-by: Stanislav Lanci <pixo@polepetko.eu> Tested-by: Nick Sarnie <commendsarnex@gmail.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Babu Moger <babu.moger@amd.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
* x86/kvm/vmx: do not use vm-exit instruction length for fast MMIO when ↵Vitaly Kuznetsov2018-01-312-2/+17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | running nested I was investigating an issue with seabios >= 1.10 which stopped working for nested KVM on Hyper-V. The problem appears to be in handle_ept_violation() function: when we do fast mmio we need to skip the instruction so we do kvm_skip_emulated_instruction(). This, however, depends on VM_EXIT_INSTRUCTION_LEN field being set correctly in VMCS. However, this is not the case. Intel's manual doesn't mandate VM_EXIT_INSTRUCTION_LEN to be set when EPT MISCONFIG occurs. While on real hardware it was observed to be set, some hypervisors follow the spec and don't set it; we end up advancing IP with some random value. I checked with Microsoft and they confirmed they don't fill VM_EXIT_INSTRUCTION_LEN on EPT MISCONFIG. Fix the issue by doing instruction skip through emulator when running nested. Fixes: 68c3b4d1676d870f0453c31d5a52e7e65c7448ae Suggested-by: Radim Krčmář <rkrcmar@redhat.com> Suggested-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
* kvm: embed vcpu id to dentry of vcpu anon inodeMasatake YAMATO2018-01-311-1/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | All d-entries for vcpu have the same, "anon_inode:kvm-vcpu". That means it is impossible to know the mapping between fds for vcpu and vcpu from userland. # LC_ALL=C ls -l /proc/617/fd | grep vcpu lrwx------. 1 qemu qemu 64 Jan 7 16:50 18 -> anon_inode:kvm-vcpu lrwx------. 1 qemu qemu 64 Jan 7 16:50 19 -> anon_inode:kvm-vcpu It is also impossible to know the mapping between vma for kvm_run structure and vcpu from userland. # LC_ALL=C grep vcpu /proc/617/maps 7f9d842d0000-7f9d842d3000 rw-s 00000000 00:0d 20393 anon_inode:kvm-vcpu 7f9d842d3000-7f9d842d6000 rw-s 00000000 00:0d 20393 anon_inode:kvm-vcpu This change adds vcpu id to d-entries for vcpu. With this change you can get the following output: # LC_ALL=C ls -l /proc/617/fd | grep vcpu lrwx------. 1 qemu qemu 64 Jan 7 16:50 18 -> anon_inode:kvm-vcpu:0 lrwx------. 1 qemu qemu 64 Jan 7 16:50 19 -> anon_inode:kvm-vcpu:1 # LC_ALL=C grep vcpu /proc/617/maps 7f9d842d0000-7f9d842d3000 rw-s 00000000 00:0d 20393 anon_inode:kvm-vcpu:0 7f9d842d3000-7f9d842d6000 rw-s 00000000 00:0d 20393 anon_inode:kvm-vcpu:1 With the mappings known from the output, a tool like strace can report more details of qemu-kvm process activities. Here is the strace output of my local prototype: # ./strace -KK -f -p 617 2>&1 | grep 'KVM_RUN\| K' ... [pid 664] ioctl(18, KVM_RUN, 0) = 0 (KVM_EXIT_MMIO) K ready_for_interrupt_injection=1, if_flag=0, flags=0, cr8=0000000000000000, apic_base=0x000000fee00d00 K phys_addr=0, len=1634035803, [33, 0, 0, 0, 0, 0, 0, 0], is_write=112 [pid 664] ioctl(18, KVM_RUN, 0) = 0 (KVM_EXIT_MMIO) K ready_for_interrupt_injection=1, if_flag=1, flags=0, cr8=0000000000000000, apic_base=0x000000fee00d00 K phys_addr=0, len=1634035803, [33, 0, 0, 0, 0, 0, 0, 0], is_write=112 ... Signed-off-by: Masatake YAMATO <yamato@redhat.com> Acked-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
* kvm: Map PFN-type memory regions as writable (if possible)KarimAllah Ahmed2018-01-311-2/+5
| | | | | | | | | | | | | | | | | For EPT-violations that are triggered by a read, the pages are also mapped with write permissions (if their memory region is also writable). That would avoid getting yet another fault on the same page when a write occurs. This optimization only happens when you have a "struct page" backing the memory region. So also enable it for memory regions that do not have a "struct page". Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Radim Krčmář <rkrcmar@redhat.com> Cc: kvm@vger.kernel.org Cc: linux-kernel@vger.kernel.org Signed-off-by: KarimAllah Ahmed <karahmed@amazon.de> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
* Merge tag 'kvm-arm-for-v4.16' of ↵Radim Krčmář2018-01-3137-362/+579
|\ | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm KVM/ARM Changes for v4.16 The changes for this version include icache invalidation optimizations (improving VM startup time), support for forwarded level-triggered interrupts (improved performance for timers and passthrough platform devices), a small fix for power-management notifiers, and some cosmetic changes.
| * KVM: arm/arm64: Fixup userspace irqchip static key optimizationChristoffer Dall2018-01-311-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When I introduced a static key to avoid work in the critical path for userspace irqchips which is very rarely used, I accidentally messed up my logic and used && where I should have used ||, because the point was to short-circuit the evaluation in case userspace irqchips weren't even in use. This fixes an issue when running in-kernel irqchip VMs alongside userspace irqchip VMs. Acked-by: Marc Zyngier <marc.zyngier@arm.com> Fixes: c44c232ee2d3 ("KVM: arm/arm64: Avoid work when userspace iqchips are not used") Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
| * KVM: arm/arm64: Fix userspace_irqchip_in_use countingChristoffer Dall2018-01-311-2/+3
| | | | | | | | | | | | | | | | | | | | | | We were not decrementing the static key count in the right location. kvm_arch_vcpu_destroy() is only called to clean up after a failed VCPU create attempt, whereas kvm_arch_vcpu_free() is called on teardown of the VM as well. Move the static key decrement call to kvm_arch_vcpu_free(). Acked-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
| * KVM: arm/arm64: Fix incorrect timer_is_pending logicChristoffer Dall2018-01-311-19/+17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | After the recently introduced support for level-triggered mapped interrupt, I accidentally left the VCPU thread busily going back and forward between the guest and the hypervisor whenever the guest was blocking, because I would always incorrectly report that a timer interrupt was pending. This is because the timer->irq.level field is not valid for mapped interrupts, where we offload the level state to the hardware, and as a result this field is always true. Luckily the problem can be relatively easily solved by not checking the cached signal state of either timer in kvm_timer_should_fire() but instead compute the timer state on the fly, which we do already if the cached signal state wasn't high. In fact, the only reason for checking the cached signal state was a tiny optimization which would only be potentially faster when the polling loop detects a pending timer interrupt, which is quite unlikely. Instead of duplicating the logic from kvm_arch_timer_handler(), we enlighten kvm_timer_should_fire() to report something valid when the timer state is loaded onto the hardware. We can then call this from kvm_arch_timer_handler() as well and avoid the call to __timer_snapshot_state() in kvm_arch_timer_get_input_level(). Reported-by: Tomasz Nowicki <tn@semihalf.com> Tested-by: Tomasz Nowicki <tn@semihalf.com> Reviewed-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
| * KVM: arm/arm64: Fix trailing semicolonLuis de Bethencourt2018-01-231-1/+1
| | | | | | | | | | | | | | | | The trailing semicolon is an empty statement that does no operation. Removing it since it doesn't do anything. Signed-off-by: Luis de Bethencourt <luisbg@kernel.org> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
| * KVM: arm/arm64: Handle CPU_PM_ENTER_FAILEDJames Morse2018-01-231-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | cpu_pm_enter() calls the pm notifier chain with CPU_PM_ENTER, then if there is a failure: CPU_PM_ENTER_FAILED. When KVM receives CPU_PM_ENTER it calls cpu_hyp_reset() which will return us to the hyp-stub. If we subsequently get a CPU_PM_ENTER_FAILED, KVM does nothing, leaving the CPU running with the hyp-stub, at odds with kvm_arm_hardware_enabled. Add CPU_PM_ENTER_FAILED as a fallthrough for CPU_PM_EXIT, this reloads KVM based on kvm_arm_hardware_enabled. This is safe even if CPU_PM_ENTER never gets as far as KVM, as cpu_hyp_reinit() calls cpu_hyp_reset() to make sure the hyp-stub is loaded before reloading KVM. Fixes: 67f691976662 ("arm64: kvm: allows kvm cpu hotplug") Cc: <stable@vger.kernel.org> # v4.7+ CC: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: James Morse <james.morse@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
| * arm64: mm: Add additional parameter to uaccess_ttbr0_disableChristoffer Dall2018-01-177-11/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add an extra temporary register parameter to uaccess_ttbr0_disable which is about to be required for arm64 PAN support. This patch doesn't introduce any functional change but ensures that the kernel compiles once the KVM/ARM tree is merged with the arm64 tree by ensuring a trivially mergable conflict with commit 6b88a32c7af68895134872cdec3b6bfdb532d94e ("arm64: kpti: Fix the interaction between ASID switching and software PAN"). Cc: Will Deacon <will.deacon@arm.com> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Acked-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
| * arm64: mm: Add additional parameter to uaccess_ttbr0_enableChristoffer Dall2018-01-093-5/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add an extra temporary register parameter to uaccess_ttbr0_enable which is about to be required for arm64 PAN support. This patch doesn't introduce any functional change but ensures that the kernel compiles once the KVM/ARM tree is merged with the arm64 tree by ensuring a trivially mergable conflict with commit 27a921e75711d924617269e0ba4adb8bae9fd0d1 ("arm64: mm: Fix and re-enable ARM64_SW_TTBR0_PAN"). Cc: Will Deacon <will.deacon@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Reviewed-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
| * KVM: arm/arm64: Drop vcpu parameter from guest cache maintenance operartionsMarc Zyngier2018-01-083-20/+12
| | | | | | | | | | | | | | | | | | The vcpu parameter isn't used for anything, and gets in the way of further cleanups. Let's get rid of it. Acked-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
| * KVM: arm/arm64: Preserve Exec permission across R/W permission faultsMarc Zyngier2018-01-083-0/+47
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | So far, we loose the Exec property whenever we take permission faults, as we always reconstruct the PTE/PMD from scratch. This can be counter productive as we can end-up with the following fault sequence: X -> RO -> ROX -> RW -> RWX Instead, we can lookup the existing PTE/PMD and clear the XN bit in the new entry if it was already cleared in the old one, leadig to a much nicer fault sequence: X -> ROX -> RWX Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
| * KVM: arm/arm64: Only clean the dcache on translation faultMarc Zyngier2018-01-081-2/+6
| | | | | | | | | | | | | | | | | | | | | | The only case where we actually need to perform a dcache maintenance is when we map the page for the first time, and subsequent permission faults do not require cache maintenance. Let's make it conditional on not being a permission fault (and thus a translation fault). Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
| * KVM: arm/arm64: Limit icache invalidation to prefetch abortsMarc Zyngier2018-01-085-8/+43
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We've so far eagerly invalidated the icache, no matter how the page was faulted in (data or prefetch abort). But we can easily track execution by setting the XN bits in the S2 page tables, get the prefetch abort at HYP and perform the icache invalidation at that time only. As for most VMs, the instruction working set is pretty small compared to the data set, this is likely to save some traffic (specially as the invalidation is broadcast). Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
| * arm64: KVM: PTE/PMD S2 XN bit definitionMarc Zyngier2018-01-081-0/+2
| | | | | | | | | | | | | | | | | | As we're about to make S2 page-tables eXecute Never by default, add the required bits for both PMDs and PTEs. Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
| * arm: KVM: Add optimized PIPT icache flushingMarc Zyngier2018-01-082-3/+31
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Calling __cpuc_coherent_user_range to invalidate the icache on a PIPT icache machine has some pointless overhead, as it starts by cleaning the dcache to the PoU, while we're guaranteed to have already cleaned it to the PoC. As KVM is the only user of such a feature, let's implement some ad-hoc cache flushing in kvm_mmu.h. Should it become useful to other subsystems, it can be moved to a more global location. Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
| * arm64: KVM: Add invalidate_icache_range helperMarc Zyngier2018-01-084-12/+52
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We currently tightly couple dcache clean with icache invalidation, but KVM could do without the initial flush to PoU, as we've already flushed things to PoC. Let's introduce invalidate_icache_range which is limited to invalidating the icache from the linear mapping (and thus has none of the userspace fault handling complexity), and wire it in KVM instead of flush_icache_range. Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
| * KVM: arm/arm64: Split dcache/icache flushingMarc Zyngier2018-01-083-26/+67
| | | | | | | | | | | | | | | | | | As we're about to introduce opportunistic invalidation of the icache, let's split dcache and icache flushing. Acked-by: Christoffer Dall <cdall@linaro.org> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
| * KVM: arm/arm64: Detangle kvm_mmu.h from kvm_hyp.hMarc Zyngier2018-01-088-2/+6
| | | | | | | | | | | | | | | | | | | | kvm_hyp.h has an odd dependency on kvm_mmu.h, which makes the opposite inclusion impossible. Let's start with breaking that useless dependency. Acked-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
| * Revert "arm64: KVM: Hide PMU from guests when disabled"Christoffer Dall2018-01-081-21/+14
| | | | | | | | | | | | | | | | | | | | | | Commit 0c0543a128bd1c6a4c8610d0d9d869053fa2fbf5 breaks migration and introduces a regression with existing userspace because it introduces an ordering requirement of setting up all VCPU features before writing ID registers which we didn't have before. Revert this commit for now until we have a proper fix. Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
| * KVM: arm/arm64: Delete outdated forwarded irq documentationChristoffer Dall2018-01-021-187/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | The reason I added this documentation originally was that the concept of "never taking the interrupt", but just use the timer to generate an exit from the guest, was confusing to most, and we had to explain it several times over. But as we can clearly see, we've failed to update the documentation as the code has evolved, and people who need to understand these details are probably better off reading the code. Let's lighten our maintenance burden slightly and just get rid of this. Acked-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
| * KVM: arm/arm64: Avoid work when userspace iqchips are not usedChristoffer Dall2018-01-024-19/+50
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We currently check if the VM has a userspace irqchip in several places along the critical path, and if so, we do some work which is only required for having an irqchip in userspace. This is unfortunate, as we could avoid doing any work entirely, if we didn't have to support irqchip in userspace. Realizing the userspace irqchip on ARM is mostly a developer or hobby feature, and is unlikely to be used in servers or other scenarios where performance is a priority, we can use a refcounted static key to only check the irqchip configuration when we have at least one VM that uses an irqchip in userspace. Reviewed-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
| * KVM: arm/arm64: Provide a get_input_level for the arch timerChristoffer Dall2018-01-022-46/+40
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The VGIC can now support the life-cycle of mapped level-triggered interrupts, and we no longer have to read back the timer state on every exit from the VM if we had an asserted timer interrupt signal, because the VGIC already knows if we hit the unlikely case where the guest disables the timer without ACKing the virtual timer interrupt. This means we rework a bit of the code to factor out the functionality to snapshot the timer state from vtimer_save_state(), and we can reuse this functionality in the sync path when we have an irqchip in userspace, and also to support our implementation of the get_input_level() function for the timer. This change also means that we can no longer rely on the timer's view of the interrupt line to set the active state, because we no longer maintain this state for mapped interrupts when exiting from the guest. Instead, we only set the active state if the virtual interrupt is active, and otherwise we simply let the timer fire again and raise the virtual interrupt from the ISR. Reviewed-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
| * KVM: arm/arm64: Support VGIC dist pend/active changes for mapped IRQsChristoffer Dall2018-01-023-6/+73
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | For mapped IRQs (with the HW bit set in the LR) we have to follow some rules of the architecture. One of these rules is that VM must not be allowed to deactivate a virtual interrupt with the HW bit set unless the physical interrupt is also active. This works fine when injecting mapped interrupts, because we leave it up to the injector to either set EOImode==1 or manually set the active state of the physical interrupt. However, the guest can set virtual interrupt to be pending or active by writing to the virtual distributor, which could lead to deactivating a virtual interrupt with the HW bit set without the physical interrupt being active. We could set the physical interrupt to active whenever we are about to enter the VM with a HW interrupt either pending or active, but that would be really slow, especially on GICv2. So we take the long way around and do the hard work when needed, which is expected to be extremely rare. When the VM sets the pending state for a HW interrupt on the virtual distributor we set the active state on the physical distributor, because the virtual interrupt can become active and then the guest can deactivate it. When the VM clears the pending state we also clear it on the physical side, because the injector might otherwise raise the interrupt. We also clear the physical active state when the virtual interrupt is not active, since otherwise a SPEND/CPEND sequence from the guest would prevent signaling of future interrupts. Changing the state of mapped interrupts from userspace is not supported, and it's expected that userspace unmaps devices from VFIO before attempting to set the interrupt state, because the interrupt state is driven by hardware. Reviewed-by: Marc Zyngier <marc.zyngier@arm.com> Reviewed-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
| * KVM: arm/arm64: Support a vgic interrupt line level sample functionChristoffer Dall2018-01-023-6/+23
| | | | | | | | | | | | | | | | | | | | | | | | | | The GIC sometimes need to sample the physical line of a mapped interrupt. As we know this to be notoriously slow, provide a callback function for devices (such as the timer) which can do this much faster than talking to the distributor, for example by comparing a few in-memory values. Fall back to the good old method of poking the physical GIC if no callback is provided. Reviewed-by: Marc Zyngier <marc.zyngier@arm.com> Reviewed-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
| * KVM: arm/arm64: vgic: Support level-triggered mapped interruptsChristoffer Dall2018-01-024-0/+88
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Level-triggered mapped IRQs are special because we only observe rising edges as input to the VGIC, and we don't set the EOI flag and therefore are not told when the level goes down, so that we can re-queue a new interrupt when the level goes up. One way to solve this problem is to side-step the logic of the VGIC and special case the validation in the injection path, but it has the unfortunate drawback of having to peak into the physical GIC state whenever we want to know if the interrupt is pending on the virtual distributor. Instead, we can maintain the current semantics of a level triggered interrupt by sort of treating it as an edge-triggered interrupt, following from the fact that we only observe an asserting edge. This requires us to be a bit careful when populating the LRs and when folding the state back in though: * We lower the line level when populating the LR, so that when subsequently observing an asserting edge, the VGIC will do the right thing. * If the guest never acked the interrupt while running (for example if it had masked interrupts at the CPU level while running), we have to preserve the pending state of the LR and move it back to the line_level field of the struct irq when folding LR state. If the guest never acked the interrupt while running, but changed the device state and lowered the line (again with interrupts masked) then we need to observe this change in the line_level. Both of the above situations are solved by sampling the physical line and set the line level when folding the LR back. * Finally, if the guest never acked the interrupt while running and sampling the line reveals that the device state has changed and the line has been lowered, we must clear the physical active state, since we will otherwise never be told when the interrupt becomes asserted again. This has the added benefit of making the timer optimization patches (https://lists.cs.columbia.edu/pipermail/kvmarm/2017-July/026343.html) a bit simpler, because the timer code doesn't have to clear the active state on the sync anymore. It also potentially improves the performance of the timer implementation because the GIC knows the state or the LR and only needs to clear the active state when the pending bit in the LR is still set, where the timer has to always clear it when returning from running the guest with an injected timer interrupt. Reviewed-by: Marc Zyngier <marc.zyngier@arm.com> Reviewed-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
| * KVM: arm/arm64: Don't cache the timer IRQ levelChristoffer Dall2018-01-021-7/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The timer logic was designed after a strict idea of modeling an interrupt line level in software, meaning that only transitions in the level need to be reported to the VGIC. This works well for the timer, because the arch timer code is in complete control of the device and can track the transitions of the line. However, as we are about to support using the HW bit in the VGIC not just for the timer, but also for VFIO which cannot track transitions of the interrupt line, we have to decide on an interface between the GIC and other subsystems for level triggered mapped interrupts, which both the timer and VFIO can use. VFIO only sees an asserting transition of the physical interrupt line, and tells the VGIC when that happens. That means that part of the interrupt flow is offloaded to the hardware. To use the same interface for VFIO devices and the timer, we therefore have to change the timer (we cannot change VFIO because it doesn't know the details of the device it is assigning to a VM). Luckily, changing the timer is simple, we just need to stop 'caching' the line level, but instead let the VGIC know the state of the timer every time there is a potential change in the line level, and when the line level should be asserted from the timer ISR. The VGIC can ignore extra notifications using its validate mechanism. Reviewed-by: Marc Zyngier <marc.zyngier@arm.com> Reviewed-by: Andre Przywara <andre.przywara@arm.com> Reviewed-by: Julien Thierry <julien.thierry@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
| * KVM: arm/arm64: Factor out functionality to get vgic mmio requester_vcpuChristoffer Dall2018-01-021-16/+28
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We are about to distinguish between userspace accesses and mmio traps for a number of the mmio handlers. When the requester vcpu is NULL, it means we are handling a userspace access. Factor out the functionality to get the request vcpu into its own function, mostly so we have a common place to document the semantics of the return value. Also take the chance to move the functionality outside of holding a spinlock and instead explicitly disable and enable preemption. This supports PREEMPT_RT kernels as well. Acked-by: Marc Zyngier <marc.zyngier@arm.com> Reviewed-by: Andre Przywara <andre.przywara@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
| * KVM: arm/arm64: Remove redundant preemptible checksChristoffer Dall2018-01-021-2/+0
| | | | | | | | | | | | | | | | | | | | | | | | The __this_cpu_read() and __this_cpu_write() functions already implement checks for the required preemption levels when using CONFIG_DEBUG_PREEMPT which gives you nice error messages and such. Therefore there is no need to explicitly check this using a BUG_ON() in the code (which we don't do for other uses of per cpu variables either). Acked-by: Marc Zyngier <marc.zyngier@arm.com> Reviewed-by: Andre Przywara <andre.przywara@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
| * KVM: arm: Use PTR_ERR_OR_ZERO()Vasyl Gomonovych2018-01-021-3/+1
| | | | | | | | | | | | | | | | | | | | | | | | Fix ptr_ret.cocci warnings: virt/kvm/arm/vgic/vgic-its.c:971:1-3: WARNING: PTR_ERR_OR_ZERO can be used Use PTR_ERR_OR_ZERO rather than if(IS_ERR(...)) + PTR_ERR Generated by: scripts/coccinelle/api/ptr_ret.cocci Signed-off-by: Vasyl Gomonovych <gomonovych@gmail.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
| * arm64: KVM: Hide PMU from guests when disabledAndrew Jones2018-01-021-14/+21
| | | | | | | | | | | | | | | | | | | | Since commit 93390c0a1b20 ("arm64: KVM: Hide unsupported AArch64 CPU features from guests") we can hide cpu features from guests. Apply this to a long standing issue where guests see a PMU available, but it's not, because it was not enabled by KVM's userspace. Signed-off-by: Andrew Jones <drjones@redhat.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
* | Merge tag 'kvm-s390-next-4.16-3' of ↵Radim Krčmář2018-01-311-1/+4
|\ \ | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux KVM: s390: update maintainers
| * | MAINTAINERS: update KVM/s390 maintainersCornelia Huck2018-01-311-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | As I have neither too much time nor access to the architecture documentation anymore, let's switch my status from maintainer to reviewer. Janosch will step in as second maintainer. Acked-by: Janosch Frank <frankja@linux.vnet.ibm.com> Acked-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Cornelia Huck <cohuck@redhat.com>
| * | MAINTAINERS: add Halil as additional vfio-ccw maintainerCornelia Huck2018-01-311-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | Acked-by: Janosch Frank <frankja@linux.vnet.ibm.com> Acked-by: Christian Borntraeger <borntraeger@de.ibm.com> Acked-by: Halil Pasic <pasic@linux.vnet.ibm.com> Acked-by: Dong Jia Shi <bjsdjshi@linux.vnet.ibm.com> Signed-off-by: Cornelia Huck <cohuck@redhat.com>
| * | MAINTAINERS: add David as a reviewer for KVM/s390Cornelia Huck2018-01-311-0/+1
|/ / | | | | | | | | | | | | Acked-by: Janosch Frank <frankja@linux.vnet.ibm.com> Acked-by: Christian Borntraeger <borntraeger@de.ibm.com> Acked-by: David Hildenbrand <david@redhat.com> Signed-off-by: Cornelia Huck <cohuck@redhat.com>
* | Merge tag 'kvm-s390-next-4.16-2' of ↵Radim Krčmář2018-01-3013-178/+496
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux KVM: s390: Fixes and features for 4.16 part 2 - exitless interrupts for emulated devices (Michael Mueller) - cleanup of cpuflag handling (David Hildenbrand) - kvm stat counter improvements (Christian Borntraeger) - vsie improvements (David Hildenbrand) - mm cleanup (Janosch Frank)
| * | KVM: s390: introduce the format-1 GISAMichael Mueller2018-01-263-13/+37
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The patch modifies the previously defined GISA data structure to be able to store two GISA formats, format-0 and format-1. Additionally, it verifies the availability of the GISA format facility and enables the use of a format-1 GISA in the SIE control block accordingly. A format-1 can do everything that format-0 can and we will need it for real HW passthrough. As there are systems with only format-0 we keep both variants. Signed-off-by: Michael Mueller <mimu@linux.vnet.ibm.com> Reviewed-by: Pierre Morel <pmorel@linux.vnet.ibm.com> Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com> Reviewed-by: David Hildenbrand <david@redhat.com> Acked-by: Cornelia Huck <cohuck@redhat.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
| * | s390/sclp: expose the GISA format facilityMichael Mueller2018-01-262-1/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The GISA format facility is required by the host to be able to process a format-1 GISA. If not available, the used GISA format will be format-0. All format-1 related extension will not be available in this case. Signed-off-by: Michael Mueller <mimu@linux.vnet.ibm.com> Reviewed-by: Pierre Morel <pmorel@linux.vnet.ibm.com> Reviewed-by: Halil Pasic <pasic@linux.vnet.ibm.com> Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com> Acked-by: David Hildenbrand <david@redhat.com> Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
| * | KVM: s390: activate GISA for emulated interruptsMichael Mueller2018-01-261-1/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If the AIV facility is available, a GISA will be used to manage emulated adapter interrupts. Signed-off-by: Michael Mueller <mimu@linux.vnet.ibm.com> Reviewed-by: Halil Pasic <pasic@linux.vnet.ibm.com> Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com> Reviewed-by: David Hildenbrand <david@redhat.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
| * | KVM: s390: make kvm_s390_get_io_int() aware of GISAMichael Mueller2018-01-261-4/+70
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The function returns a pending I/O interrupt with the highest priority defined by its ISC. Together with AIV activation, pending adapter interrupts are managed by the GISA IPM. Thus kvm_s390_get_io_int() needs to inspect the IPM as well when the interrupt with the highest priority has to be identified. In case classic and adapter interrupts with the same ISC are pending, the classic interrupt will be returned first. Signed-off-by: Michael Mueller <mimu@linux.vnet.ibm.com> Reviewed-by: Halil Pasic <pasic@linux.vnet.ibm.com> Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
| * | KVM: s390: add GISA interrupts to FLIC ioctl interfaceMichael Mueller2018-01-261-0/+18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pending interrupts marked in the GISA IPM are required to become part of the answer of ioctl KVM_DEV_FLIC_GET_ALL_IRQS. The ioctl KVM_DEV_FLIC_ENQUEUE is already capable to enqueue adapter interrupts when a GISA is present. With ioctl KVM_DEV_FLIC_CLEAR_IRQS the GISA IPM wil be cleared now as well. Signed-off-by: Michael Mueller <mimu@linux.vnet.ibm.com> Reviewed-by: Halil Pasic <pasic@linux.vnet.ibm.com> Reviewed-by: Pierre Morel <pmorel@linux.vnet.ibm.com> Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
| * | KVM: s390: abstract adapter interruption word generation from ISCMichael Mueller2018-01-261-2/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The function isc_to_int_word() allows the generation of interruption words for adapter interrupts. Signed-off-by: Michael Mueller <mimu@linux.vnet.ibm.com> Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Reviewed-by: David Hildenbrand <david@redhat.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
| * | KVM: s390: exploit GISA and AIV for emulated interruptsMichael Mueller2018-01-264-18/+87
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The adapter interruption virtualization (AIV) facility is an optional facility that comes with functionality expected to increase the performance of adapter interrupt handling for both emulated and passed-through adapter interrupts. With AIV, adapter interrupts can be delivered to the guest without exiting SIE. This patch provides some preparations for using AIV for emulated adapter interrupts (including virtio) if it's available. When using AIV, the interrupts are delivered at the so called GISA by setting the bit corresponding to its Interruption Subclass (ISC) in the Interruption Pending Mask (IPM) instead of inserting a node into the floating interrupt list. To keep the change reasonably small, the handling of this new state is deferred in get_all_floating_irqs and handle_tpi. This patch concentrates on the code handling enqueuement of emulated adapter interrupts, and their delivery to the guest. Note that care is still required for adapter interrupts using AIV, because there is no guarantee that AIV is going to deliver the adapter interrupts pending at the GISA (consider all vcpus idle). When delivering GISA adapter interrupts by the host (usual mechanism) special attention is required to honor interrupt priorities. Empirical results show that the time window between making an interrupt pending at the GISA and doing kvm_s390_deliver_pending_interrupts is sufficient for a guest with at least moderate cpu activity to get adapter interrupts delivered within the SIE, and potentially save some SIE exits (if not other deliverable interrupts). The code will be activated with a follow-up patch. Signed-off-by: Michael Mueller <mimu@linux.vnet.ibm.com> Acked-by: Christian Borntraeger <borntraeger@de.ibm.com> Reviewed-by: David Hildenbrand <david@redhat.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
| * | s390/css: indicate the availability of the AIV facilityMichael Mueller2018-01-261-1/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The patch adds an indication for the presence Adapter Interruption Virtualization facility (AIV) of the general channel subsystem characteristics. Signed-off-by: Michael Mueller <mimu@linux.vnet.ibm.com> Reviewed-by: Halil Pasic <pasic@linux.vnet.ibm.com> Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com> Acked-by: Cornelia Huck <cohuck@redhat.com> Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Reviewed-by: David Hildenbrand <david@redhat.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> [change wording]
| * | KVM: s390: implement GISA IPM related primitivesMichael Mueller2018-01-261-0/+28
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The patch implements routines to access the GISA to test and modify its Interruption Pending Mask (IPM) from the host side. Signed-off-by: Michael Mueller <mimu@linux.vnet.ibm.com> Reviewed-by: Pierre Morel <pmorel@linux.vnet.ibm.com> Reviewed-by: Halil Pasic <pasic@linux.vnet.ibm.com> Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com> Reviewed-by: David Hildenbrand <david@redhat.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
| * | s390/bitops: add test_and_clear_bit_inv()Jens Freimann2018-01-261-0/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch adds a MSB0 bit numbering version of test_and_clear_bit(). Signed-off-by: Jens Freimann <jfrei@linux.vnet.ibm.com> Signed-off-by: Michael Mueller <mimu@linux.vnet.ibm.com> Reviewed-by: Pierre Morel <pmorel@linux.vnet.ibm.com> Reviewed-by: Halil Pasic <pasic@linux.vnet.ibm.com> Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com> Reviewed-by: David Hildenbrand <david@redhat.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
| * | KVM: s390: define GISA format-0 data structureMichael Mueller2018-01-262-4/+20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In preperation to support pass-through adapter interrupts, the Guest Interruption State Area (GISA) and the Adapter Interruption Virtualization (AIV) features will be introduced here. This patch introduces format-0 GISA (that is defines the struct describing the GISA, allocates storage for it, and introduces fields for the GISA address in kvm_s390_sie_block and kvm_s390_vsie). As the GISA requires storage below 2GB, it is put in sie_page2, which is already allocated in ZONE_DMA. In addition, The GISA requires alignment to its integral boundary. This is already naturally aligned via the padding in the sie_page2. Signed-off-by: Michael Mueller <mimu@linux.vnet.ibm.com> Reviewed-by: Pierre Morel <pmorel@linux.vnet.ibm.com> Reviewed-by: Halil Pasic <pasic@linux.vnet.ibm.com> Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com> Reviewed-by: David Hildenbrand <david@redhat.com> Acked-by: Cornelia Huck <cohuck@redhat.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>