summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* Merge branch 'work.splice' of ↵Linus Torvalds2020-06-038-170/+77
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull splice updates from Al Viro: "Christoph's assorted splice cleanups" * 'work.splice' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: fs: rename pipe_buf ->steal to ->try_steal fs: make the pipe_buf_operations ->confirm operation optional fs: make the pipe_buf_operations ->steal operation optional trace: remove tracing_pipe_buf_ops pipe: merge anon_pipe_buf*_ops fs: simplify do_splice_from fs: simplify do_splice_to
| * fs: rename pipe_buf ->steal to ->try_stealChristoph Hellwig2020-05-206-60/+58
| | | | | | | | | | | | | | | | | | | | And replace the arcane return value convention with a simple bool where true means success and false means failure. [AV: braino fix folded in] Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * fs: make the pipe_buf_operations ->confirm operation optionalChristoph Hellwig2020-05-206-25/+3
| | | | | | | | | | | | | | Just return 0 for success if it is not present. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * fs: make the pipe_buf_operations ->steal operation optionalChristoph Hellwig2020-05-204-16/+2
| | | | | | | | | | | | | | Just return 1 for failure if it is not present. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * trace: remove tracing_pipe_buf_opsChristoph Hellwig2020-05-201-8/+1
| | | | | | | | | | | | | | | | tracing_pipe_buf_ops has identical ops to default_pipe_buf_ops, so use that instead. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * pipe: merge anon_pipe_buf*_opsChristoph Hellwig2020-05-203-48/+11
| | | | | | | | | | | | | | | | | | | | All the op vectors are exactly the same, they are just used to encode packet or nomerge behavior. There already is a flag for the packet behavior, so just add a new one to allow for merging. Inverting it vs the previous nomerge special casing actually allows for much nicer code. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * fs: simplify do_splice_fromChristoph Hellwig2020-05-201-8/+2
| | | | | | | | | | | | | | | | No need for a local function pointer when we can trivial branch on the ->splice_write presence. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * fs: simplify do_splice_toChristoph Hellwig2020-05-201-7/+2
| | | | | | | | | | | | | | | | No need for a local function pointer when we can trivial branch on the ->splice_read presence. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* | Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds2020-06-03155-3048/+5309
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull kvm updates from Paolo Bonzini: "ARM: - Move the arch-specific code into arch/arm64/kvm - Start the post-32bit cleanup - Cherry-pick a few non-invasive pre-NV patches x86: - Rework of TLB flushing - Rework of event injection, especially with respect to nested virtualization - Nested AMD event injection facelift, building on the rework of generic code and fixing a lot of corner cases - Nested AMD live migration support - Optimization for TSC deadline MSR writes and IPIs - Various cleanups - Asynchronous page fault cleanups (from tglx, common topic branch with tip tree) - Interrupt-based delivery of asynchronous "page ready" events (host side) - Hyper-V MSRs and hypercalls for guest debugging - VMX preemption timer fixes s390: - Cleanups Generic: - switch vCPU thread wakeup from swait to rcuwait The other architectures, and the guest side of the asynchronous page fault work, will come next week" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (256 commits) KVM: selftests: fix rdtsc() for vmx_tsc_adjust_test KVM: check userspace_addr for all memslots KVM: selftests: update hyperv_cpuid with SynDBG tests x86/kvm/hyper-v: Add support for synthetic debugger via hypercalls x86/kvm/hyper-v: enable hypercalls regardless of hypercall page x86/kvm/hyper-v: Add support for synthetic debugger interface x86/hyper-v: Add synthetic debugger definitions KVM: selftests: VMX preemption timer migration test KVM: nVMX: Fix VMX preemption timer migration x86/kvm/hyper-v: Explicitly align hcall param for kvm_hyperv_exit KVM: x86/pmu: Support full width counting KVM: x86/pmu: Tweak kvm_pmu_get_msr to pass 'struct msr_data' in KVM: x86: announce KVM_FEATURE_ASYNC_PF_INT KVM: x86: acknowledgment mechanism for async pf page ready notifications KVM: x86: interrupt based APF 'page ready' event delivery KVM: introduce kvm_read_guest_offset_cached() KVM: rename kvm_arch_can_inject_async_page_present() to kvm_arch_can_dequeue_async_page_present() KVM: x86: extend struct kvm_vcpu_pv_apf_data with token info Revert "KVM: async_pf: Fix #DF due to inject "Page not Present" and "Page Ready" exceptions simultaneously" KVM: VMX: Replace zero-length array with flexible-array ...
| * | KVM: selftests: fix rdtsc() for vmx_tsc_adjust_testVitaly Kuznetsov2020-06-011-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | vmx_tsc_adjust_test fails with: IA32_TSC_ADJUST is -4294969448 (-1 * TSC_ADJUST_VALUE + -2152). IA32_TSC_ADJUST is -4294969448 (-1 * TSC_ADJUST_VALUE + -2152). IA32_TSC_ADJUST is 281470681738540 (65534 * TSC_ADJUST_VALUE + 4294962476). ==== Test Assertion Failure ==== x86_64/vmx_tsc_adjust_test.c:153: false pid=19738 tid=19738 - Interrupted system call 1 0x0000000000401192: main at vmx_tsc_adjust_test.c:153 2 0x00007fe1ef8583d4: ?? ??:0 3 0x0000000000401201: _start at ??:? Failed guest assert: (adjust <= max) The problem is that is 'tsc_val' should be u64, not u32 or the reading gets truncated. Fixes: 8d7fbf01f9afc ("KVM: selftests: VMX preemption timer migration test") Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Message-Id: <20200601154726.261868-1-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
| * | Merge branch 'kvm-master' into HEADPaolo Bonzini2020-06-010-0/+0
| |\ \ | | | | | | | | | | | | | | | | | | | | | | | | This merge brings in a few fixes that I would have sent this week, had there been a 5.7-rc8 release. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
| | * | KVM: check userspace_addr for all memslotsPaolo Bonzini2020-06-011-3/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The userspace_addr alignment and range checks are not performed for private memory slots that are prepared by KVM itself. This is unnecessary and makes it questionable to use __*_user functions to access memory later on. We also rely on the userspace address being aligned since we have an entire family of functions to map gfn to pfn. Fortunately skipping the check is completely unnecessary. Only x86 uses private memslots and their userspace_addr is obtained from vm_mmap, therefore it must be below PAGE_OFFSET. In fact, any attempt to pass an address above PAGE_OFFSET would have failed because such an address would return true for kvm_is_error_hva. Reported-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
| * | | Merge tag 'kvmarm-5.8' of ↵Paolo Bonzini2020-06-0156-639/+629
| |\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD KVM/arm64 updates for Linux 5.8: - Move the arch-specific code into arch/arm64/kvm - Start the post-32bit cleanup - Cherry-pick a few non-invasive pre-NV patches
| | * | | KVM: arm64: Drop obsolete comment about sys_reg orderingMarc Zyngier2020-05-281-5/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The general comment about keeping the enum order in sync with the save/restore code has been obsolete for many years now. Just drop it. Note that there are other ordering requirements in the enum, such as the PtrAuth and PMU registers, which are still valid. Reported-by: James Morse <james.morse@arm.com> Signed-off-by: Marc Zyngier <maz@kernel.org>
| | * | | KVM: arm64: Parametrize exception entry with a target ELMarc Zyngier2020-05-282-37/+39
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We currently assume that an exception is delivered to EL1, always. Once we emulate EL2, this no longer will be the case. To prepare for this, add a target_mode parameter. While we're at it, merge the computing of the target PC and PSTATE in a single function that updates both PC and CPSR after saving their previous values in the corresponding ELR/SPSR. This ensures that they are updated in the correct order (a pretty common source of bugs...). Reviewed-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Marc Zyngier <maz@kernel.org>
| | * | | KVM: arm64: Don't use empty structures as CPU reset stateMarc Zyngier2020-05-281-12/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Keeping empty structure as the vcpu state initializer is slightly wasteful: we only want to set pstate, and zero everything else. Just do that. Signed-off-by: Marc Zyngier <maz@kernel.org>
| | * | | KVM: arm64: Move sysreg reset check to boot timeMarc Zyngier2020-05-281-37/+35
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Our sysreg reset check has become a bit silly, as it only checks whether a reset callback actually exists for a given sysreg entry, and apply the method if available. Doing the check at each vcpu reset is pretty dumb, as the tables never change. It is thus perfectly possible to do the same checks at boot time. This also allows us to introduce a sparse sys_regs[] array, something that will be required with ARMv8.4-NV. Signed-off-by: Marc Zyngier <maz@kernel.org>
| | * | | KVM: arm64: Add missing reset handlers for PMU emulationMarc Zyngier2020-05-281-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | As we're about to become a bit more harsh when it comes to the lack of reset callbacks, let's add the missing PMU reset handlers. Note that these only cover *CLR registers that were always covered by their *SET counterpart, so there is no semantic change here. Reviewed-by: James Morse <james.morse@arm.com> Signed-off-by: Marc Zyngier <maz@kernel.org>
| | * | | KVM: arm64: Refactor vcpu_{read,write}_sys_regMarc Zyngier2020-05-281-57/+71
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Extract the direct HW accessors for later reuse. Reviewed-by: James Morse <james.morse@arm.com> Signed-off-by: Marc Zyngier <maz@kernel.org>
| | * | | KVM: arm64: vgic-v3: Take cpu_if pointer directly instead of vcpuChristoffer Dall2020-05-287-53/+54
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If we move the used_lrs field to the version-specific cpu interface structure, the following functions only operate on the struct vgic_v3_cpu_if and not the full vcpu: __vgic_v3_save_state __vgic_v3_restore_state __vgic_v3_activate_traps __vgic_v3_deactivate_traps __vgic_v3_save_aprs __vgic_v3_restore_aprs This is going to be very useful for nested virt, so move the used_lrs field and change the prototypes and implementations of these functions to take the cpu_if parameter directly. No functional change. Reviewed-by: James Morse <james.morse@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@arm.com> Signed-off-by: Marc Zyngier <maz@kernel.org>
| | * | | KVM: arm64: Remove obsolete kvm_virt_to_phys abstractionAndrew Scull2020-05-252-5/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This abstraction was introduced to hide the difference between arm and arm64 but, with the former no longer supported, this abstraction can be removed and the canonical kernel API used directly instead. Signed-off-by: Andrew Scull <ascull@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> CC: Marc Zyngier <maz@kernel.org> CC: James Morse <james.morse@arm.com> CC: Suzuki K Poulose <suzuki.poulose@arm.com> Link: https://lore.kernel.org/r/20200519104036.259917-1-ascull@google.com
| | * | | KVM: arm64: Fix incorrect comment on kvm_get_hyp_vector()David Brazdil2020-05-251-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The comment used to say that kvm_get_hyp_vector is only called on VHE systems. In fact, it is also called from the nVHE init function cpu_init_hyp_mode(). Fix the comment to stop confusing devs. Signed-off-by: David Brazdil <dbrazdil@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20200515152550.83810-1-dbrazdil@google.com
| | * | | KVM: arm64: Clean up cpu_init_hyp_mode()David Brazdil2020-05-253-40/+29
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull bits of code to the only place where it is used. Remove empty function __cpu_init_stage2(). Remove redundant has_vhe() check since this function is nVHE-only. No functional changes intended. Signed-off-by: David Brazdil <dbrazdil@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20200515152056.83158-1-dbrazdil@google.com
| | * | | KVM: arm64: Make KVM_CAP_MAX_VCPUS compatible with the selected GIC versionMarc Zyngier2020-05-161-5/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | KVM_CAP_MAX_VCPUS always return the maximum possible number of VCPUs, irrespective of the selected interrupt controller. This is pretty misleading for userspace that selects a GICv2 on a GICv3 system that supports v2 compat: It always gets a maximum of 512 VCPUs, even if the effective limit is 8. The 9th VCPU will fail to be created, which is unexpected as far as userspace is concerned. Fortunately, we already have the right information stashed in the kvm structure, and we can return it as requested. Reported-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Marc Zyngier <maz@kernel.org> Tested-by: Alexandru Elisei <alexandru.elisei@arm.com> Reviewed-by: Alexandru Elisei <alexandru.elisei@arm.com> Link: https://lore.kernel.org/r/20200427141507.284985-1-maz@kernel.org
| | * | | KVM: arm64: Support enabling dirty log gradually in small chunksKeqian Zhu2020-05-163-3/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There is already support of enabling dirty log gradually in small chunks for x86 in commit 3c9bd4006bfc ("KVM: x86: enable dirty log gradually in small chunks"). This adds support for arm64. x86 still writes protect all huge pages when DIRTY_LOG_INITIALLY_ALL_SET is enabled. However, for arm64, both huge pages and normal pages can be write protected gradually by userspace. Under the Huawei Kunpeng 920 2.6GHz platform, I did some tests on 128G Linux VMs with different page size. The memory pressure is 127G in each case. The time taken of memory_global_dirty_log_start in QEMU is listed below: Page Size Before After Optimization 4K 650ms 1.8ms 2M 4ms 1.8ms 1G 2ms 1.8ms Besides the time reduction, the biggest improvement is that we will minimize the performance side effect (because of dissolving huge pages and marking memslots dirty) on guest after enabling dirty log. Signed-off-by: Keqian Zhu <zhukeqian1@huawei.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20200413122023.52583-1-zhukeqian1@huawei.com
| | * | | KVM: arm64: Unify handling THP backed host memorySuzuki K Poulose2020-05-161-55/+60
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We support mapping host memory backed by PMD transparent hugepages at stage2 as huge pages. However the checks are now spread across two different places. Let us unify the handling of the THPs to keep the code cleaner (and future proof for PUD THP support). This patch moves transparent_hugepage_adjust() closer to the caller to avoid a forward declaration for fault_supports_stage2_huge_mappings(). Also, since we already handle the case where the host VA and the guest PA may not be aligned, the explicit VM_BUG_ON() is not required. Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com> Signed-off-by: Zenghui Yu <yuzenghui@huawei.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20200507123546.1875-3-yuzenghui@huawei.com
| | * | | KVM: arm64: Clean up the checking for huge mappingSuzuki K Poulose2020-05-161-1/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If we are checking whether the stage2 can map PAGE_SIZE, we don't have to do the boundary checks as both the host VMA and the guest memslots are page aligned. Bail the case easily. While we're at it, fixup a typo in the comment below. Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com> Signed-off-by: Zenghui Yu <yuzenghui@huawei.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20200507123546.1875-2-yuzenghui@huawei.com
| | * | | KVM: arm/arm64: Release kvm->mmu_lock in loop to prevent starvationJiang Yi2020-05-161-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Do cond_resched_lock() in stage2_flush_memslot() like what is done in unmap_stage2_range() and other places holding mmu_lock while processing a possibly large range of memory. Signed-off-by: Jiang Yi <giangyi@amazon.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com> Link: https://lore.kernel.org/r/20200415084229.29992-1-giangyi@amazon.com
| | * | | KVM: arm64: Sidestep stage2_unmap_vm() on vcpu reset when S2FWB is supportedZenghui Yu2020-05-161-1/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | stage2_unmap_vm() was introduced to unmap user RAM region in the stage2 page table to make the caches coherent. E.g., a guest reboot with stage1 MMU disabled will access memory using non-cacheable attributes. If the RAM and caches are not coherent at this stage, some evicted dirty cache line may go and corrupt guest data in RAM. Since ARMv8.4, S2FWB feature is mandatory and KVM will take advantage of it to configure the stage2 page table and the attributes of memory access. So we ensure that guests always access memory using cacheable attributes and thus, the caches always be coherent. So on CPUs that support S2FWB, we can safely reset the vcpu without a heavy stage2 unmapping. Signed-off-by: Zenghui Yu <yuzenghui@huawei.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20200415072835.1164-1-yuzenghui@huawei.com
| | * | | KVM: Fix spelling in code commentsFuad Tabba2020-05-1612-23/+23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Fix spelling and typos (e.g., repeated words) in comments. Signed-off-by: Fuad Tabba <tabba@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20200401140310.29701-1-tabba@google.com
| | * | | KVM: arm64: Use cpus_have_final_cap for has_vhe()Marc Zyngier2020-05-161-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | By the time we start using the has_vhe() helper, we have long discovered whether we are running VHE or not. It thus makes sense to use cpus_have_final_cap() instead of cpus_have_const_cap(), which leads to a small text size reduction. Signed-off-by: Marc Zyngier <maz@kernel.org> Acked-by: David Brazdil <dbrazdil@google.com> Link: https://lore.kernel.org/r/20200513103828.74580-1-maz@kernel.org
| | * | | KVM: arm64: Simplify __kvm_timer_set_cntvoff implementationMarc Zyngier2020-05-163-14/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Now that this function isn't constrained by the 32bit PCS, let's simplify it by taking a single 64bit offset instead of two 32bit parameters. Signed-off-by: Marc Zyngier <maz@kernel.org>
| | * | | KVM: arm64: Clean up kvm makefilesFuad Tabba2020-05-162-36/+17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Consolidate references to the CONFIG_KVM configuration item to encompass entire folders rather than per line. Signed-off-by: Fuad Tabba <tabba@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Acked-by: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20200505154520.194120-5-tabba@google.com
| | * | | KVM: arm64: Change CONFIG_KVM to a menuconfig entryWill Deacon2020-05-161-5/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Changing CONFIG_KVM to be a 'menuconfig' entry in Kconfig mean that we can straightforwardly enumerate optional features, such as the virtual PMU device as dependent options. Signed-off-by: Will Deacon <will@kernel.org> Signed-off-by: Fuad Tabba <tabba@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20200505154520.194120-4-tabba@google.com
| | * | | KVM: arm64: Update help textWill Deacon2020-05-161-2/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | arm64 KVM supports 16k pages since 02e0b7600f83 ("arm64: kvm: Add support for 16K pages"), so update the Kconfig help text accordingly. Signed-off-by: Will Deacon <will@kernel.org> Signed-off-by: Fuad Tabba <tabba@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20200505154520.194120-3-tabba@google.com
| | * | | KVM: arm64: Kill off CONFIG_KVM_ARM_HOSTWill Deacon2020-05-166-46/+40
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | CONFIG_KVM_ARM_HOST is just a proxy for CONFIG_KVM, so remove it in favour of the latter. Signed-off-by: Will Deacon <will@kernel.org> Signed-off-by: Fuad Tabba <tabba@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20200505154520.194120-2-tabba@google.com
| | * | | KVM: arm64: Move virt/kvm/arm to arch/arm64Marc Zyngier2020-05-1636-257/+253
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Now that the 32bit KVM/arm host is a distant memory, let's move the whole of the KVM/arm64 code into the arm64 tree. As they said in the song: Welcome Home (Sanitarium). Signed-off-by: Marc Zyngier <maz@kernel.org> Acked-by: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20200513104034.74741-1-maz@kernel.org
| * | | | KVM: check userspace_addr for all memslotsPaolo Bonzini2020-06-011-3/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The userspace_addr alignment and range checks are not performed for private memory slots that are prepared by KVM itself. This is unnecessary and makes it questionable to use __*_user functions to access memory later on. We also rely on the userspace address being aligned since we have an entire family of functions to map gfn to pfn. Fortunately skipping the check is completely unnecessary. Only x86 uses private memslots and their userspace_addr is obtained from vm_mmap, therefore it must be below PAGE_OFFSET. In fact, any attempt to pass an address above PAGE_OFFSET would have failed because such an address would return true for kvm_is_error_hva. Reported-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
| * | | | KVM: selftests: update hyperv_cpuid with SynDBG testsVitaly Kuznetsov2020-06-011-47/+56
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Update tests to reflect new CPUID capabilities with SYNDBG. Check that we get the right number of entries and that 0x40000000.EAX always returns the correct max leaf. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Jon Doron <arilou@gmail.com> Message-Id: <20200529134543.1127440-7-arilou@gmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
| * | | | x86/kvm/hyper-v: Add support for synthetic debugger via hypercallsJon Doron2020-06-011-0/+28
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There is another mode for the synthetic debugger which uses hypercalls to send/recv network data instead of the MSR interface. This interface is much slower and less recommended since you might get a lot of VMExits while KDVM polling for new packets to recv, rather than simply checking the pending page to see if there is data avialble and then request. Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Jon Doron <arilou@gmail.com> Message-Id: <20200529134543.1127440-6-arilou@gmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
| * | | | x86/kvm/hyper-v: enable hypercalls regardless of hypercall pageJon Doron2020-06-011-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Microsoft's kdvm.dll dbgtransport module does not respect the hypercall page and simply identifies the CPU being used (AMD/Intel) and according to it simply makes hypercalls with the relevant instruction (vmmcall/vmcall respectively). The relevant function in kdvm is KdHvConnectHypervisor which first checks if the hypercall page has been enabled via HV_X64_MSR_HYPERCALL_ENABLE, and in case it was not it simply sets the HV_X64_MSR_GUEST_OS_ID to 0x1000101010001 which means: build_number = 0x0001 service_version = 0x01 minor_version = 0x01 major_version = 0x01 os_id = 0x00 (Undefined) vendor_id = 1 (Microsoft) os_type = 0 (A value of 0 indicates a proprietary, closed source OS) and starts issuing the hypercall without setting the hypercall page. To resolve this issue simply enable hypercalls also if the guest_os_id is not 0. Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Jon Doron <arilou@gmail.com> Message-Id: <20200529134543.1127440-5-arilou@gmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
| * | | | x86/kvm/hyper-v: Add support for synthetic debugger interfaceJon Doron2020-06-017-3/+258
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add support for Hyper-V synthetic debugger (syndbg) interface. The syndbg interface is using MSRs to emulate a way to send/recv packets data. The debug transport dll (kdvm/kdnet) will identify if Hyper-V is enabled and if it supports the synthetic debugger interface it will attempt to use it, instead of trying to initialize a network adapter. Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Jon Doron <arilou@gmail.com> Message-Id: <20200529134543.1127440-4-arilou@gmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
| * | | | x86/hyper-v: Add synthetic debugger definitionsJon Doron2020-06-012-0/+33
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Hyper-V synthetic debugger has two modes, one that uses MSRs and the other that use Hypercalls. Add all the required definitions to both types of synthetic debugger interface. Some of the required new CPUIDs and MSRs are not documented in the TLFS so they are in hyperv.h instead. The reason they are not documented is because they are subjected to be removed in future versions of Windows. Reviewed-by: Michael Kelley <mikelley@microsoft.com> Signed-off-by: Jon Doron <arilou@gmail.com> Message-Id: <20200529134543.1127440-3-arilou@gmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
| * | | | KVM: selftests: VMX preemption timer migration testMakarand Sonare2020-06-018-12/+298
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When a nested VM with a VMX-preemption timer is migrated, verify that the nested VM and its parent VM observe the VMX-preemption timer exit close to the original expiration deadline. Signed-off-by: Makarand Sonare <makarandsonare@google.com> Reviewed-by: Jim Mattson <jmattson@google.com> Message-Id: <20200526215107.205814-3-makarandsonare@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
| * | | | KVM: nVMX: Fix VMX preemption timer migrationPeter Shier2020-06-014-8/+56
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add new field to hold preemption timer expiration deadline appended to struct kvm_vmx_nested_state_hdr. This is to prevent the first VM-Enter after migration from incorrectly restarting the timer with the full timer value instead of partially decayed timer value. KVM_SET_NESTED_STATE restarts timer using migrated state regardless of whether L1 sets VM_EXIT_SAVE_VMX_PREEMPTION_TIMER. Fixes: cf8b84f48a593 ("kvm: nVMX: Prepare for checkpointing L2 state") Signed-off-by: Peter Shier <pshier@google.com> Signed-off-by: Makarand Sonare <makarandsonare@google.com> Message-Id: <20200526215107.205814-2-makarandsonare@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
| * | | | x86/kvm/hyper-v: Explicitly align hcall param for kvm_hyperv_exitJon Doron2020-06-012-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The problem the patch is trying to address is the fact that 'struct kvm_hyperv_exit' has different layout on when compiling in 32 and 64 bit modes. In 64-bit mode the default alignment boundary is 64 bits thus forcing extra gaps after 'type' and 'msr' but in 32-bit mode the boundary is at 32 bits thus no extra gaps. This is an issue as even when the kernel is 64 bit, the userspace using the interface can be both 32 and 64 bit but the same 32 bit userspace has to work with 32 bit kernel. The issue is fixed by forcing the 64 bit layout, this leads to ABI change for 32 bit builds and while we are obviously breaking '32 bit userspace with 32 bit kernel' case, we're fixing the '32 bit userspace with 64 bit kernel' one. As the interface has no (known) users and 32 bit KVM is rather baroque nowadays, this seems like a reasonable decision. Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Jon Doron <arilou@gmail.com> Message-Id: <20200424113746.3473563-2-arilou@gmail.com> Reviewed-by: Roman Kagan <rvkagan@yandex-team.ru> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
| * | | | KVM: x86/pmu: Support full width countingLike Xu2020-06-016-5/+66
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Intel CPUs have a new alternative MSR range (starting from MSR_IA32_PMC0) for GP counters that allows writing the full counter width. Enable this range from a new capability bit (IA32_PERF_CAPABILITIES.FW_WRITE[bit 13]). The guest would query CPUID to get the counter width, and sign extends the counter values as needed. The traditional MSRs always limit to 32bit, even though the counter internally is larger (48 or 57 bits). When the new capability is set, use the alternative range which do not have these restrictions. This lowers the overhead of perf stat slightly because it has to do less interrupts to accumulate the counter value. Signed-off-by: Like Xu <like.xu@linux.intel.com> Message-Id: <20200529074347.124619-3-like.xu@linux.intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
| * | | | KVM: x86/pmu: Tweak kvm_pmu_get_msr to pass 'struct msr_data' inWei Wang2020-06-015-17/+21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Change kvm_pmu_get_msr() to get the msr_data struct, as the host_initiated field from the struct could be used by get_msr. This also makes this API consistent with kvm_pmu_set_msr. No functional changes. Signed-off-by: Wei Wang <wei.w.wang@intel.com> Message-Id: <20200529074347.124619-2-like.xu@linux.intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
| * | | | KVM: x86: announce KVM_FEATURE_ASYNC_PF_INTVitaly Kuznetsov2020-06-016-5/+19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Introduce new capability to indicate that KVM supports interrupt based delivery of 'page ready' APF events. This includes support for both MSR_KVM_ASYNC_PF_INT and MSR_KVM_ASYNC_PF_ACK. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Message-Id: <20200525144125.143875-8-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
| * | | | KVM: x86: acknowledgment mechanism for async pf page ready notificationsVitaly Kuznetsov2020-06-016-5/+52
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If two page ready notifications happen back to back the second one is not delivered and the only mechanism we currently have is kvm_check_async_pf_completion() check in vcpu_run() loop. The check will only be performed with the next vmexit when it happens and in some cases it may take a while. With interrupt based page ready notification delivery the situation is even worse: unlike exceptions, interrupts are not handled immediately so we must check if the slot is empty. This is slow and unnecessary. Introduce dedicated MSR_KVM_ASYNC_PF_ACK MSR to communicate the fact that the slot is free and host should check its notification queue. Mandate using it for interrupt based 'page ready' APF event delivery. As kvm_check_async_pf_completion() is going away from vcpu_run() we need a way to communicate the fact that vcpu->async_pf.done queue has transitioned from empty to non-empty state. Introduce kvm_arch_async_page_present_queued() and KVM_REQ_APF_READY to do the job. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Message-Id: <20200525144125.143875-7-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>