summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* KVM: trivial documentation cleanupsAndrew Jones2018-03-282-4/+8
| | | | | | | | | Add missing entries to the index and ensure the entries are in alphabetical order. Also amd-memory-encryption.rst is an .rst not a .txt. Signed-off-by: Andrew Jones <drjones@redhat.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
* KVM: X86: Fix setup the virt_spin_lock_key before static key get initializedWanpeng Li2018-03-281-3/+9
| | | | | | | | | | | | | | | | | | | | | | | static_key_disable_cpuslocked(): static key 'virt_spin_lock_key+0x0/0x20' used before call to jump_label_init() WARNING: CPU: 0 PID: 0 at kernel/jump_label.c:161 static_key_disable_cpuslocked+0x61/0x80 RIP: 0010:static_key_disable_cpuslocked+0x61/0x80 Call Trace: static_key_disable+0x16/0x20 start_kernel+0x192/0x4b3 secondary_startup_64+0xa5/0xb0 Qspinlock will be choosed when dedicated pCPUs are available, however, the static virt_spin_lock_key is set in kvm_spinlock_init() before jump_label_init() has been called, which will result in a WARN(). This patch fixes it by delaying the virt_spin_lock_key setup to .smp_prepare_cpus(). Reported-by: Davidlohr Bueso <dbueso@suse.de> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Radim Krčmář <rkrcmar@redhat.com> Cc: Davidlohr Bueso <dbueso@suse.de> Signed-off-by: Wanpeng Li <wanpengli@tencent.com> Fixes: b2798ba0b876 ("KVM: X86: Choose qspinlock when dedicated physical CPUs are available") Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
* tools/kvm_stat: Remove unused functionCole Robinson2018-03-281-3/+0
| | | | | | | | Unused since added in 18e8f4100 Signed-off-by: Cole Robinson <crobinso@redhat.com> Reviewed-and-tested-by: Stefan Raspl <stefan.raspl@linux.vnet.ibm.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
* tools/kvm_stat: Don't use deprecated file()Cole Robinson2018-03-281-1/+1
| | | | | | | | | | | | | | | | | | $ python3 tools/kvm/kvm_stat/kvm_stat Traceback (most recent call last): File "tools/kvm/kvm_stat/kvm_stat", line 1668, in <module> main() File "tools/kvm/kvm_stat/kvm_stat", line 1639, in main assign_globals() File "tools/kvm/kvm_stat/kvm_stat", line 1618, in assign_globals for line in file('/proc/mounts'): NameError: name 'file' is not defined open() is the python3 way, and works on python2.6+ Signed-off-by: Cole Robinson <crobinso@redhat.com> Reviewed-and-tested-by: Stefan Raspl <stefan.raspl@linux.vnet.ibm.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
* tools/kvm_stat: Fix python3 syntaxCole Robinson2018-03-281-2/+4
| | | | | | | | | | | | | | $ python3 tools/kvm/kvm_stat/kvm_stat File "tools/kvm/kvm_stat/kvm_stat", line 1137 def sortkey((_k, v)): ^ SyntaxError: invalid syntax Fix it in a way that's compatible with python2 and python3 Signed-off-by: Cole Robinson <crobinso@redhat.com> Tested-by: Stefan Raspl <stefan.raspl@linux.vnet.ibm.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
* KVM: SVM: Implement pause loop exit logic in SVMBabu Moger2018-03-282-3/+105
| | | | | | | | | | | | | | | | | Bring the PLE(pause loop exit) logic to AMD svm driver. While testing, we found this helping in situations where numerous pauses are generated. Without these patches we could see continuos VMEXITS due to pause interceptions. Tested it on AMD EPYC server with boot parameter idle=poll on a VM with 32 vcpus to simulate extensive pause behaviour. Here are VMEXITS in 10 seconds interval. Pauses 810199 504 Total 882184 325415 Signed-off-by: Babu Moger <babu.moger@amd.com> [Prevented the window from dropping below the initial value. - Radim] Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
* KVM: SVM: Add pause filter thresholdBabu Moger2018-03-282-1/+4
| | | | | | | | | | | | | | | | | | | | | | | This patch adds the support for pause filtering threshold. This feature support is indicated by CPUID Fn8000_000A_EDX. See AMD APM Vol 2 Section 15.14.4 Pause Intercept Filtering for more details. In this mode, a 16-bit pause filter threshold field is added in VMCB. The threshold value is a cycle count that is used to reset the pause counter. As with simple pause filtering, VMRUN loads the pause count value from VMCB into an internal counter. Then, on each pause instruction the hardware checks the elapsed number of cycles since the most recent pause instruction against the pause Filter Threshold. If the elapsed cycle count is greater than the pause filter threshold, then the internal pause count is reloaded from VMCB and execution continues. If the elapsed cycle count is less than the pause filter threshold, then the internal pause count is decremented. If the count value is less than zero and pause intercept is enabled, a #VMEXIT is triggered. If advanced pause filtering is supported and pause filter threshold field is set to zero, the filter will operate in the simpler, count only mode. Signed-off-by: Babu Moger <babu.moger@amd.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
* KVM: VMX: Bring the common code to header fileBabu Moger2018-03-282-42/+45
| | | | | | | | | This patch brings some of the code from vmx to x86.h header file. Now, we can share this code between vmx and svm. Modified couple functions to make it common. Signed-off-by: Babu Moger <babu.moger@amd.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
* KVM: VMX: Remove ple_window_actual_maxBabu Moger2018-03-281-25/+6
| | | | | | | | | | | | | Get rid of ple_window_actual_max, because its benefits are really minuscule and the logic is complicated. The overflows(and underflow) are controlled in __ple_window_grow and _ple_window_shrink respectively. Suggested-by: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Babu Moger <babu.moger@amd.com> [Fixed potential wraparound and change the max to UINT_MAX. - Radim] Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
* KVM: VMX: Fix the module parameters for vmxBabu Moger2018-03-281-13/+14
| | | | | | | | | | | | | The vmx module parameters are supposed to be unsigned variants. Also fixed the checkpatch errors like the one below. WARNING: Symbolic permissions 'S_IRUGO' are not preferred. Consider using octal permissions '0444'. +module_param(ple_gap, uint, S_IRUGO); Signed-off-by: Babu Moger <babu.moger@amd.com> [Expanded uint to unsigned int in code. - Radim] Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
* KVM: x86: Fix perf timer mode IP reportingAndi Kleen2018-03-284-19/+20
| | | | | | | | | | | | | | | | | | | | | | | | | | KVM and perf have a special backdoor mechanism to report the IP for interrupts re-executed after vm exit. This works for the NMIs that perf normally uses. However when perf is in timer mode it doesn't work because the timer interrupt doesn't get this special treatment. This is common when KVM is running nested in another hypervisor which may not implement the PMU, so only timer mode is available. Call the functions to set up the backdoor IP also for non NMI interrupts. I renamed the functions to set up the backdoor IP reporting to be more appropiate for their new use. The SVM change is only compile tested. v2: Moved the functions inline. For the normal interrupt case the before/after functions are now called from x86.c, not arch specific code. For the NMI case we still need to call it in the architecture specific code, because it's already needed in the low level *_run functions. Signed-off-by: Andi Kleen <ak@linux.intel.com> [Removed unnecessary calls from arch handle_external_intr. - Radim] Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
* Merge tag 'kvm-arm-for-v4.17' of ↵Radim Krčmář2018-03-2867-1045/+2435
|\ | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm KVM/ARM updates for v4.17 - VHE optimizations - EL2 address space randomization - Variant 3a mitigation for Cortex-A57 and A72 - The usual vgic fixes - Various minor tidying-up
| * arm64: Add temporary ERRATA_MIDR_ALL_VERSIONS compatibility macroMarc Zyngier2018-03-281-2/+6
| | | | | | | | | | | | | | | | | | | | | | MIDR_ALL_VERSIONS is changing, and won't have the same meaning in 4.17, and the right thing to use will be ERRATA_MIDR_ALL_VERSIONS. In order to cope with the merge window, let's add a compatibility macro that will allow a relatively smooth transition, and that can be removed post 4.17-rc1. Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
| * Revert "arm64: KVM: Use SMCCC_ARCH_WORKAROUND_1 for Falkor BP hardening"Marc Zyngier2018-03-286-20/+69
| | | | | | | | | | | | | | | | | | Creates far too many conflicts with arm64/for-next/core, to be resent post -rc1. This reverts commit f9f5dc19509bbef6f5e675346f1a7d7b846bdb12. Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
| * KVM: arm/arm64: vgic-its: Fix potential overrun in vgic_copy_lpi_listMarc Zyngier2018-03-261-6/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | vgic_copy_lpi_list() parses the LPI list and picks LPIs targeting a given vcpu. We allocate the array containing the intids before taking the lpi_list_lock, which means we can have an array size that is not equal to the number of LPIs. This is particularly obvious when looking at the path coming from vgic_enable_lpis, which is not a command, and thus can run in parallel with commands: vcpu 0: vcpu 1: vgic_enable_lpis its_sync_lpi_pending_table vgic_copy_lpi_list intids = kmalloc_array(irq_count) MAPI(lpi targeting vcpu 0) list_for_each_entry(lpi_list_head) intids[i++] = irq->intid; At that stage, we will happily overrun the intids array. Boo. An easy fix is is to break once the array is full. The MAPI command will update the config anyway, and we won't miss a thing. We also make sure that lpi_list_count is read exactly once, so that further updates of that value will not affect the array bound check. Cc: stable@vger.kernel.org Fixes: ccb1d791ab9e ("KVM: arm64: vgic-its: Fix pending table sync") Reviewed-by: Andre Przywara <andre.przywara@arm.com> Reviewed-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
| * KVM: arm/arm64: vgic: Disallow Active+Pending for level interruptsMarc Zyngier2018-03-262-48/+60
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | It was recently reported that VFIO mediated devices, and anything that VFIO exposes as level interrupts, do no strictly follow the expected logic of such interrupts as it only lowers the input line when the guest has EOId the interrupt at the GIC level, rather than when it Acked the interrupt at the device level. THe GIC's Active+Pending state is fundamentally incompatible with this behaviour, as it prevents KVM from observing the EOI, and in turn results in VFIO never dropping the line. This results in an interrupt storm in the guest, which it really never expected. As we cannot really change VFIO to follow the strict rules of level signalling, let's forbid the A+P state altogether, as it is in the end only an optimization. It ensures that we will transition via an invalid state, which we can use to notify VFIO of the EOI. Reviewed-by: Eric Auger <eric.auger@redhat.com> Tested-by: Eric Auger <eric.auger@redhat.com> Tested-by: Shunyong Yang <shunyong.yang@hxt-semitech.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
| * arm64: KVM: Use SMCCC_ARCH_WORKAROUND_1 for Falkor BP hardeningShanker Donthineni2018-03-196-69/+20
| | | | | | | | | | | | | | | | | | | | | | The function SMCCC_ARCH_WORKAROUND_1 was introduced as part of SMC V1.1 Calling Convention to mitigate CVE-2017-5715. This patch uses the standard call SMCCC_ARCH_WORKAROUND_1 for Falkor chips instead of Silicon provider service ID 0xC2001700. Cc: <stable@vger.kernel.org> # 4.14+ Signed-off-by: Shanker Donthineni <shankerd@codeaurora.org> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
| * KVM: arm: Reserve bit in KVM_REG_ARM encoding for secure/nonsecurePeter Maydell2018-03-191-0/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We have a KVM_REG_ARM encoding that we use to expose KVM guest registers to userspace. Define that bit 28 in this encoding indicates secure vs nonsecure, so we can distinguish the secure and nonsecure banked versions of a banked AArch32 register. For KVM currently, all guest registers are nonsecure, but defining the bit is useful for userspace. In particular, QEMU uses this encoding as part of its on-the-wire migration format, and needs to be able to describe secure-bank registers when it is migrating (fully emulated) EL3-enabled CPUs. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
| * Merge tag 'kvm-arm-fixes-for-v4.16-2' into HEADMarc Zyngier2018-03-1912-72/+181
| |\ | | | | | | | | | Resolve conflicts with current mainline
| | * kvm: arm/arm64: vgic-v3: Tighten synchronization for guests using v2 on v3Marc Zyngier2018-03-141-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | On guest exit, and when using GICv2 on GICv3, we use a dsb(st) to force synchronization between the memory-mapped guest view and the system-register view that the hypervisor uses. This is incorrect, as the spec calls out the need for "a DSB whose required access type is both loads and stores with any Shareability attribute", while we're only synchronizing stores. We also lack an isb after the dsb to ensure that the latter has actually been executed before we start reading stuff from the sysregs. The fix is pretty easy: turn dsb(st) into dsb(sy), and slap an isb() just after. Cc: stable@vger.kernel.org Fixes: f68d2b1b73cc ("arm64: KVM: Implement vgic-v3 save/restore") Acked-by: Christoffer Dall <cdall@kernel.org> Reviewed-by: Andre Przywara <andre.przywara@arm.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
| | * KVM: arm/arm64: vgic: Don't populate multiple LRs with the same vintidMarc Zyngier2018-03-146-16/+67
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The vgic code is trying to be clever when injecting GICv2 SGIs, and will happily populate LRs with the same interrupt number if they come from multiple vcpus (after all, they are distinct interrupt sources). Unfortunately, this is against the letter of the architecture, and the GICv2 architecture spec says "Each valid interrupt stored in the List registers must have a unique VirtualID for that virtual CPU interface.". GICv3 has similar (although slightly ambiguous) restrictions. This results in guests locking up when using GICv2-on-GICv3, for example. The obvious fix is to stop trying so hard, and inject a single vcpu per SGI per guest entry. After all, pending SGIs with multiple source vcpus are pretty rare, and are mostly seen in scenario where the physical CPUs are severely overcomitted. But as we now only inject a single instance of a multi-source SGI per vcpu entry, we may delay those interrupts for longer than strictly necessary, and run the risk of injecting lower priority interrupts in the meantime. In order to address this, we adopt a three stage strategy: - If we encounter a multi-source SGI in the AP list while computing its depth, we force the list to be sorted - When populating the LRs, we prevent the injection of any interrupt of lower priority than that of the first multi-source SGI we've injected. - Finally, the injection of a multi-source SGI triggers the request of a maintenance interrupt when there will be no pending interrupt in the LRs (HCR_NPIE). At the point where the last pending interrupt in the LRs switches from Pending to Active, the maintenance interrupt will be delivered, allowing us to add the remaining SGIs using the same process. Cc: stable@vger.kernel.org Fixes: 0919e84c0fc1 ("KVM: arm/arm64: vgic-new: Add IRQ sync/flush framework") Acked-by: Christoffer Dall <cdall@kernel.org> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
| | * KVM: arm/arm64: Reduce verbosity of KVM init logArd Biesheuvel2018-03-143-5/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | On my GICv3 system, the following is printed to the kernel log at boot: kvm [1]: 8-bit VMID kvm [1]: IDMAP page: d20e35000 kvm [1]: HYP VA range: 800000000000:ffffffffffff kvm [1]: vgic-v2@2c020000 kvm [1]: GIC system register CPU interface enabled kvm [1]: vgic interrupt IRQ1 kvm [1]: virtual timer IRQ4 kvm [1]: Hyp mode initialized successfully The KVM IDMAP is a mapping of a statically allocated kernel structure, and so printing its physical address leaks the physical placement of the kernel when physical KASLR in effect. So change the kvm_info() to kvm_debug() to remove it from the log output. While at it, trim the output a bit more: IRQ numbers can be found in /proc/interrupts, and the HYP VA and vgic-v2 lines are not highly informational either. Cc: <stable@vger.kernel.org> Acked-by: Will Deacon <will.deacon@arm.com> Acked-by: Christoffer Dall <cdall@kernel.org> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
| | * KVM: arm/arm64: Reset mapped IRQs on VM resetChristoffer Dall2018-03-143-0/+31
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We currently don't allow resetting mapped IRQs from userspace, because their state is controlled by the hardware. But we do need to reset the state when the VM is reset, so we provide a function for the 'owner' of the mapped interrupt to reset the interrupt state. Currently only the timer uses mapped interrupts, so we call this function from the timer reset logic. Cc: stable@vger.kernel.org Fixes: 4c60e360d6df ("KVM: arm/arm64: Provide a get_input_level for the arch timer") Signed-off-by: Christoffer Dall <cdall@kernel.org> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
| | * KVM: arm/arm64: Avoid vcpu_load for other vcpu ioctls than KVM_RUNChristoffer Dall2018-03-142-12/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Calling vcpu_load() registers preempt notifiers for this vcpu and calls kvm_arch_vcpu_load(). The latter will soon be doing a lot of heavy lifting on arm/arm64 and will try to do things such as enabling the virtual timer and setting us up to handle interrupts from the timer hardware. Loading state onto hardware registers and enabling hardware to signal interrupts can be problematic when we're not actually about to run the VCPU, because it makes it difficult to establish the right context when handling interrupts from the timer, and it makes the register access code difficult to reason about. Luckily, now when we call vcpu_load in each ioctl implementation, we can simply remove the call from the non-KVM_RUN vcpu ioctls, and our kvm_arch_vcpu_load() is only used for loading vcpu content to the physical CPU when we're actually going to run the vcpu. Cc: stable@vger.kernel.org Fixes: 9b062471e52a ("KVM: Move vcpu_load to arch-specific kvm_arch_vcpu_ioctl") Reviewed-by: Julien Grall <julien.grall@arm.com> Reviewed-by: Marc Zyngier <marc.zyngier@arm.com> Reviewed-by: Andrew Jones <drjones@redhat.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
| | * KVM: arm/arm64: vgic: Add missing irq_lock to vgic_mmio_read_pendingAndre Przywara2018-03-142-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Our irq_is_pending() helper function accesses multiple members of the vgic_irq struct, so we need to hold the lock when calling it. Add that requirement as a comment to the definition and take the lock around the call in vgic_mmio_read_pending(), where we were missing it before. Fixes: 96b298000db4 ("KVM: arm/arm64: vgic-new: Add PENDING registers handlers") Signed-off-by: Andre Przywara <andre.przywara@arm.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
| * | arm64: Enable ARM64_HARDEN_EL2_VECTORS on Cortex-A57 and A72Marc Zyngier2018-03-191-0/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Cortex-A57 and A72 are vulnerable to the so-called "variant 3a" of Meltdown, where an attacker can speculatively obtain the value of a privileged system register. By enabling ARM64_HARDEN_EL2_VECTORS on these CPUs, obtaining VBAR_EL2 is not disclosing the hypervisor mappings anymore. Acked-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
| * | arm64: KVM: Allow mapping of vectors outside of the RAM regionMarc Zyngier2018-03-196-12/+95
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We're now ready to map our vectors in weird and wonderful locations. On enabling ARM64_HARDEN_EL2_VECTORS, a vector slot gets allocated if this hasn't been already done via ARM64_HARDEN_BRANCH_PREDICTOR and gets mapped outside of the normal RAM region, next to the idmap. That way, being able to obtain VBAR_EL2 doesn't reveal the mapping of the rest of the hypervisor code. Acked-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
| * | arm64: Make BP hardening slot counter availableMarc Zyngier2018-03-193-6/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We're about to need to allocate hardening slots from other parts of the kernel (in order to support ARM64_HARDEN_EL2_VECTORS). Turn the counter into an atomic_t and make it available to the rest of the kernel. Also add BP_HARDEN_EL2_SLOTS as the number of slots instead of the hardcoded 4... Acked-by: Catalin Marinas <catalin.marinas@arm.com> Reviewed-by: Andrew Jones <drjones@redhat.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
| * | arm/arm64: KVM: Introduce EL2-specific executable mappingsMarc Zyngier2018-03-193-21/+63
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Until now, all EL2 executable mappings were derived from their EL1 VA. Since we want to decouple the vectors mapping from the rest of the hypervisor, we need to be able to map some text somewhere else. The "idmap" region (for lack of a better name) is ideally suited for this, as we have a huge range that hardly has anything in it. Let's extend the IO allocator to also deal with executable mappings, thus providing the required feature. Acked-by: Catalin Marinas <catalin.marinas@arm.com> Reviewed-by: Andrew Jones <drjones@redhat.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
| * | arm64: KVM: Allow far branches from vector slots to the main vectorsMarc Zyngier2018-03-194-1/+96
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | So far, the branch from the vector slots to the main vectors can at most be 4GB from the main vectors (the reach of ADRP), and this distance is known at compile time. If we were to remap the slots to an unrelated VA, things would break badly. A way to achieve VA independence would be to load the absolute address of the vectors (__kvm_hyp_vector), either using a constant pool or a series of movs, followed by an indirect branch. This patches implements the latter solution, using another instance of a patching callback. Note that since we have to save a register pair on the stack, we branch to the *second* instruction in the vectors in order to compensate for it. This also results in having to adjust this balance in the invalid vector entry point. Acked-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
| * | arm64: KVM: Reserve 4 additional instructions in the BPI templateMarc Zyngier2018-03-191-24/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | So far, we only reserve a single instruction in the BPI template in order to branch to the vectors. As we're going to stuff a few more instructions there, let's reserve a total of 5 instructions, which we're going to patch later on as required. We also introduce a small refactor of the vectors themselves, so that we stop carrying the target branch around. Acked-by: Catalin Marinas <catalin.marinas@arm.com> Reviewed-by: Andrew Jones <drjones@redhat.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
| * | arm64: KVM: Move BP hardening vectors into .hyp.text sectionMarc Zyngier2018-03-194-6/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There is no reason why the BP hardening vectors shouldn't be part of the HYP text at compile time, rather than being mapped at runtime. Also introduce a new config symbol that controls the compilation of bpi.S. Acked-by: Catalin Marinas <catalin.marinas@arm.com> Reviewed-by: Andrew Jones <drjones@redhat.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
| * | arm64: KVM: Move stashing of x0/x1 into the vector code itselfMarc Zyngier2018-03-191-24/+32
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | All our useful entry points into the hypervisor are starting by saving x0 and x1 on the stack. Let's move those into the vectors by introducing macros that annotate whether a vector is valid or not, thus indicating whether we want to stash registers or not. The only drawback is that we now also stash registers for el2_error, but this should never happen, and we pop them back right at the start of the handling sequence. Acked-by: Catalin Marinas <catalin.marinas@arm.com> Reviewed-by: Andrew Jones <drjones@redhat.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
| * | arm64: KVM: Move vector offsetting from hyp-init.S to kvm_get_hyp_vectorMarc Zyngier2018-03-192-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We currently provide the hyp-init code with a kernel VA, and expect it to turn it into a HYP va by itself. As we're about to provide the hypervisor with mappings that are not necessarily in the memory range, let's move the kern_hyp_va macro to kvm_get_hyp_vector. No functionnal change. Acked-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
| * | arm64: Update the KVM memory map documentationMarc Zyngier2018-03-191-3/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | Update the documentation to reflect the new tricks we play on the EL2 mappings... Acked-by: Christoffer Dall <christoffer.dall@linaro.org> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
| * | arm64: KVM: Introduce EL2 VA randomisationMarc Zyngier2018-03-193-8/+82
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The main idea behind randomising the EL2 VA is that we usually have a few spare bits between the most significant bit of the VA mask and the most significant bit of the linear mapping. Those bits could be a bunch of zeroes, and could be useful to move things around a bit. Of course, the more memory you have, the less randomisation you get... Alternatively, these bits could be the result of KASLR, in which case they are already random. But it would be nice to have a *different* randomization, just to make the job of a potential attacker a bit more difficult. Inserting these random bits is a bit involved. We don't have a spare register (short of rewriting all the kern_hyp_va call sites), and the immediate we want to insert is too random to be used with the ORR instruction. The best option I could come up with is the following sequence: and x0, x0, #va_mask ror x0, x0, #first_random_bit add x0, x0, #(random & 0xfff) add x0, x0, #(random >> 12), lsl #12 ror x0, x0, #(63 - first_random_bit) making it a fairly long sequence, but one that a decent CPU should be able to execute without breaking a sweat. It is of course NOPed out on VHE. The last 4 instructions can also be turned into NOPs if it appears that there is no free bits to use. Acked-by: Catalin Marinas <catalin.marinas@arm.com> Reviewed-by: James Morse <james.morse@arm.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
| * | arm64: KVM: Dynamically compute the HYP VA maskMarc Zyngier2018-03-191-11/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | As we're moving towards a much more dynamic way to compute our HYP VA, let's express the mask in a slightly different way. Instead of comparing the idmap position to the "low" VA mask, we directly compute the mask by taking into account the idmap's (VA_BIT-1) bit. No functionnal change. Acked-by: Catalin Marinas <catalin.marinas@arm.com> Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
| * | arm64: insn: Allow ADD/SUB (immediate) with LSL #12Marc Zyngier2018-03-191-2/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The encoder for ADD/SUB (immediate) can only cope with 12bit immediates, while there is an encoding for a 12bit immediate shifted by 12 bits to the left. Let's fix this small oversight by allowing the LSL_12 bit to be set. Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
| * | arm64; insn: Add encoder for the EXTR instructionMarc Zyngier2018-03-192-0/+38
| | | | | | | | | | | | | | | | | | | | | | | | | | | Add an encoder for the EXTR instruction, which also implements the ROR variant (where Rn == Rm). Acked-by: Catalin Marinas <catalin.marinas@arm.com> Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
| * | KVM: arm/arm64: Move HYP IO VAs to the "idmap" rangeMarc Zyngier2018-03-192-14/+62
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We so far mapped our HYP IO (which is essentially the GICv2 control registers) using the same method as for memory. It recently appeared that is a bit unsafe: We compute the HYP VA using the kern_hyp_va helper, but that helper is only designed to deal with kernel VAs coming from the linear map, and not from the vmalloc region... This could in turn cause some bad aliasing between the two, amplified by the upcoming VA randomisation. A solution is to come up with our very own basic VA allocator for MMIO. Since half of the HYP address space only contains a single page (the idmap), we have plenty to borrow from. Let's use the idmap as a base, and allocate downwards from it. GICv2 now lives on the other side of the great VA barrier. Acked-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
| * | KVM: arm64: Fix HYP idmap unmap when using 52bit PAMarc Zyngier2018-03-191-5/+21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Unmapping the idmap range using 52bit PA is quite broken, as we don't take into account the right number of PGD entries, and rely on PTRS_PER_PGD. The result is that pgd_index() truncates the address, and we end-up in the weed. Let's introduce a new unmap_hyp_idmap_range() that knows about this, together with a kvm_pgd_index() helper, which hides a bit of the complexity of the issue. Fixes: 98732d1b189b ("KVM: arm/arm64: fix HYP ID map extension to 52 bits") Reported-by: James Morse <james.morse@arm.com> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
| * | KVM: arm/arm64: Fix idmap size and alignmentMarc Zyngier2018-03-191-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Although the idmap section of KVM can only be at most 4kB and must be aligned on a 4kB boundary, the rest of the code expects it to be page aligned. Things get messy when tearing down the HYP page tables when PAGE_SIZE is 64K, and the idmap section isn't 64K aligned. Let's fix this by computing aligned boundaries that the HYP code will use. Acked-by: Catalin Marinas <catalin.marinas@arm.com> Reported-by: James Morse <james.morse@arm.com> Reviewed-by: James Morse <james.morse@arm.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
| * | KVM: arm/arm64: Keep GICv2 HYP VAs in kvm_vgic_global_stateMarc Zyngier2018-03-197-34/+38
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | As we're about to change the way we map devices at HYP, we need to move away from kern_hyp_va on an IO address. One way of achieving this is to store the VAs in kvm_vgic_global_state, and use that directly from the HYP code. This requires a small change to create_hyp_io_mappings so that it can also return a HYP VA. We take this opportunity to nuke the vctrl_base field in the emulated distributor, as it is not used anymore. Acked-by: Catalin Marinas <catalin.marinas@arm.com> Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
| * | KVM: arm/arm64: Move ioremap calls to create_hyp_io_mappingsMarc Zyngier2018-03-194-35/+26
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Both HYP io mappings call ioremap, followed by create_hyp_io_mappings. Let's move the ioremap call into create_hyp_io_mappings itself, which simplifies the code a bit and allows for further refactoring. Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
| * | KVM: arm/arm64: Demote HYP VA range display to being a debug featureMarc Zyngier2018-03-191-3/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Displaying the HYP VA information is slightly counterproductive when using VA randomization. Turn it into a debug feature only, and adjust the last displayed value to reflect the top of RAM instead of ~0. Acked-by: Christoffer Dall <christoffer.dall@linaro.org> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
| * | KVM: arm/arm64: Do not use kern_hyp_va() with kvm_vgic_global_stateMarc Zyngier2018-03-193-1/+28
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | kvm_vgic_global_state is part of the read-only section, and is usually accessed using a PC-relative address generation (adrp + add). It is thus useless to use kern_hyp_va() on it, and actively problematic if kern_hyp_va() becomes non-idempotent. On the other hand, there is no way that the compiler is going to guarantee that such access is always PC relative. So let's bite the bullet and provide our own accessor. Acked-by: Catalin Marinas <catalin.marinas@arm.com> Reviewed-by: James Morse <james.morse@arm.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
| * | arm64: cpufeatures: Drop the ARM64_HYP_OFFSET_LOW feature flagMarc Zyngier2018-03-192-20/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Now that we can dynamically compute the kernek/hyp VA mask, there is no need for a feature flag to trigger the alternative patching. Let's drop the flag and everything that depends on it. Acked-by: Christoffer Dall <christoffer.dall@linaro.org> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
| * | arm64: KVM: Dynamically patch the kernel/hyp VA maskMarc Zyngier2018-03-193-34/+105
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | So far, we're using a complicated sequence of alternatives to patch the kernel/hyp VA mask on non-VHE, and NOP out the masking altogether when on VHE. The newly introduced dynamic patching gives us the opportunity to simplify that code by patching a single instruction with the correct mask (instead of the mind bending cumulative masking we have at the moment) or even a single NOP on VHE. This also adds some initial code that will allow the patching callback to switch to a more complex patching. Acked-by: Catalin Marinas <catalin.marinas@arm.com> Reviewed-by: James Morse <james.morse@arm.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
| * | arm64: insn: Add encoder for bitwise operations using literalsMarc Zyngier2018-03-192-0/+145
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We lack a way to encode operations such as AND, ORR, EOR that take an immediate value. Doing so is quite involved, and is all about reverse engineering the decoding algorithm described in the pseudocode function DecodeBitMasks(). This has been tested by feeding it all the possible literal values and comparing the output with that of GAS. Acked-by: Christoffer Dall <christoffer.dall@linaro.org> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
| * | arm64: insn: Add N immediate encodingMarc Zyngier2018-03-192-0/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We're missing the a way to generate the encoding of the N immediate, which is only a single bit used in a number of instruction that take an immediate. Acked-by: Christoffer Dall <christoffer.dall@linaro.org> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>