summaryrefslogtreecommitdiffstats
path: root/arch/x86/kvm/svm.c
Commit message (Collapse)AuthorAgeFilesLines
* KVM: SVM: Issue WBINVD after deactivating an SEV guestTom Lendacky2020-03-231-8/+14
| | | | | | | | | | | | | | | | | | | | | Currently, CLFLUSH is used to flush SEV guest memory before the guest is terminated (or a memory hotplug region is removed). However, CLFLUSH is not enough to ensure that SEV guest tagged data is flushed from the cache. With 33af3a7ef9e6 ("KVM: SVM: Reduce WBINVD/DF_FLUSH invocations"), the original WBINVD was removed. This then exposed crashes at random times because of a cache flush race with a page that had both a hypervisor and a guest tag in the cache. Restore the WBINVD when destroying an SEV guest and add a WBINVD to the svm_unregister_enc_region() function to ensure hotplug memory is flushed when removed. The DF_FLUSH can still be avoided at this point. Fixes: 33af3a7ef9e6 ("KVM: SVM: Reduce WBINVD/DF_FLUSH invocations") Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Message-Id: <c8bf9087ca3711c5770bdeaafa3e45b717dc5ef4.1584720426.git.thomas.lendacky@amd.com> Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* KVM: SVM: document KVM_MEM_ENCRYPT_OP, let userspace detect if SEV is availablePaolo Bonzini2020-03-201-0/+3
| | | | | | | | | | | | Userspace has no way to query if SEV has been disabled with the sev module parameter of kvm-amd.ko. Actually it has one, but it is a hack: do ioctl(KVM_MEM_ENCRYPT_OP, NULL) and check if it returns EFAULT. Make it a little nicer by returning zero for SEV enabled and NULL argument, and while at it document the ioctl arguments. Cc: Brijesh Singh <brijesh.singh@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* KVM: SVM: Fix the svm vmexit code for WRMSRHaiwei Li2020-03-021-1/+2
| | | | | | | | | | | | | | In svm, exit_code for MSR writes is not EXIT_REASON_MSR_WRITE which belongs to vmx. According to amd manual, SVM_EXIT_MSR(7ch) is the exit_code of VMEXIT_MSR due to RDMSR or WRMSR access to protected MSR. Additionally, the processor indicates in the VMCB's EXITINFO1 whether a RDMSR(EXITINFO1=0) or WRMSR(EXITINFO1=1) was intercepted. Signed-off-by: Haiwei Li <lihaiwei@tencent.com> Fixes: 1e9e2622a149 ("KVM: VMX: FIXED+PHYSICAL mode single target IPI fastpath", 2019-11-21) Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* KVM: x86: allow compiling as non-module with W=1Valdis Klētnieks2020-02-281-0/+2
| | | | | | | | | | | | | | | | | | | | Compile error with CONFIG_KVM_INTEL=y and W=1: CC arch/x86/kvm/vmx/vmx.o arch/x86/kvm/vmx/vmx.c:68:32: error: 'vmx_cpu_id' defined but not used [-Werror=unused-const-variable=] 68 | static const struct x86_cpu_id vmx_cpu_id[] = { | ^~~~~~~~~~ cc1: all warnings being treated as errors When building with =y, the MODULE_DEVICE_TABLE macro doesn't generate a reference to the structure (or any code at all). This makes W=1 compiles unhappy. Wrap both in a #ifdef to avoid the issue. Signed-off-by: Valdis Kletnieks <valdis.kletnieks@vt.edu> [Do the same for CONFIG_KVM_AMD. - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* KVM: SVM: allocate AVIC data structures based on kvm_amd module parameterPaolo Bonzini2020-02-281-1/+2
| | | | | | | | | | | | | Even if APICv is disabled at startup, the backing page and ir_list need to be initialized in case they are needed later. The only case in which this can be skipped is for userspace irqchip, and that must be done because avic_init_backing_page dereferences vcpu->arch.apic (which is NULL for userspace irqchip). Tested-by: rmuncrief@humanavance.com Fixes: https://bugzilla.kernel.org/show_bug.cgi?id=206579 Reviewed-by: Miaohe Lin <linmiaohe@huawei.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* KVM: nVMX: Emulate MTF when performing instruction emulationOliver Upton2020-02-231-0/+1
| | | | | | | | | | | | | | | | | Since commit 5f3d45e7f282 ("kvm/x86: add support for MONITOR_TRAP_FLAG"), KVM has allowed an L1 guest to use the monitor trap flag processor-based execution control for its L2 guest. KVM simply forwards any MTF VM-exits to the L1 guest, which works for normal instruction execution. However, when KVM needs to emulate an instruction on the behalf of an L2 guest, the monitor trap flag is not emulated. Add the necessary logic to kvm_skip_emulated_instruction() to synthesize an MTF VM-exit to L1 upon instruction emulation for L2. Fixes: 5f3d45e7f282 ("kvm/x86: add support for MONITOR_TRAP_FLAG") Signed-off-by: Oliver Upton <oupton@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* KVM: fix error handling in svm_hardware_setupLi RongQing2020-02-231-21/+20
| | | | | | | | | | | | | | rename svm_hardware_unsetup as svm_hardware_teardown, move it before svm_hardware_setup, and call it to free all memory if fail to setup in svm_hardware_setup, otherwise memory will be leaked remove __exit attribute for it since it is called in __init function Signed-off-by: Li RongQing <lirongqing@baidu.com> Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* KVM: SVM: Fix potential memory leak in svm_cpu_init()Miaohe Lin2020-02-211-7/+6
| | | | | | | | | | | When kmalloc memory for sd->sev_vmcbs failed, we forget to free the page held by sd->save_area. Also get rid of the var r as '-ENOMEM' is actually the only possible outcome here. Reviewed-by: Liran Alon <liran.alon@oracle.com> Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* KVM: nVMX: handle nested posted interrupts when apicv is disabled for L1Vitaly Kuznetsov2020-02-211-1/+6
| | | | | | | | | | | | Even when APICv is disabled for L1 it can (and, actually, is) still available for L2, this means we need to always call vmx_deliver_nested_posted_interrupt() when attempting an interrupt delivery. Suggested-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* kvm: x86: svm: Fix NULL pointer dereference when AVIC not enabledSuravee Suthikulpanit2020-02-211-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Launching VM w/ AVIC disabled together with pass-through device results in NULL pointer dereference bug with the following call trace. RIP: 0010:svm_refresh_apicv_exec_ctrl+0x17e/0x1a0 [kvm_amd] Call Trace: kvm_vcpu_update_apicv+0x44/0x60 [kvm] kvm_arch_vcpu_ioctl_run+0x3f4/0x1c80 [kvm] kvm_vcpu_ioctl+0x3d8/0x650 [kvm] do_vfs_ioctl+0xaa/0x660 ? tomoyo_file_ioctl+0x19/0x20 ksys_ioctl+0x67/0x90 __x64_sys_ioctl+0x1a/0x20 do_syscall_64+0x57/0x190 entry_SYSCALL_64_after_hwframe+0x44/0xa9 Investigation shows that this is due to the uninitialized usage of struct vapu_svm.ir_list in the svm_set_pi_irte_mode(), which is called from svm_refresh_apicv_exec_ctrl(). The ir_list is initialized only if AVIC is enabled. So, fixes by adding a check if AVIC is enabled in the svm_refresh_apicv_exec_ctrl(). Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=206579 Fixes: 8937d762396d ("kvm: x86: svm: Add support to (de)activate posted interrupts.") Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Tested-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* KVM: x86: do not reset microcode version on INIT or RESETPaolo Bonzini2020-02-121-1/+1
| | | | | | | | | | | | | | | | | Do not initialize the microcode version at RESET or INIT, only on vCPU creation. Microcode updates are not lost during INIT, and exact behavior across a warm RESET is not specified by the architecture. Since we do not support a microcode update directly from the hypervisor, but only as a result of userspace setting the microcode version MSR, it's simpler for userspace if we do nothing in KVM and let userspace emulate behavior for RESET as it sees fit. Userspace can tie the fix to the availability of MSR_IA32_UCODE_REV in the list of emulated MSRs. Reported-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* KVM: SVM: relax conditions for allowing MSR_IA32_SPEC_CTRL accessesPaolo Bonzini2020-02-051-0/+4
| | | | | | | | | | | | | | | Userspace that does not know about the AMD_IBRS bit might still allow the guest to protect itself with MSR_IA32_SPEC_CTRL using the Intel SPEC_CTRL bit. However, svm.c disallows this and will cause a #GP in the guest when writing to the MSR. Fix this by loosening the test and allowing the Intel CPUID bit, and in fact allow the AMD_STIBP bit as well since it allows writing to MSR_IA32_SPEC_CTRL too. Reported-by: Zhiyi Guo <zhguo@redhat.com> Analyzed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Analyzed-by: Laszlo Ersek <lersek@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* KVM: SVM: allow AVIC without split irqchipPaolo Bonzini2020-02-051-1/+1
| | | | | | | | SVM is now able to disable AVIC dynamically whenever the in-kernel PIT sets up an ack notifier, so we can enable it even if in-kernel IOAPIC/PIC/PIT are in use. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* kvm: i8254: Deactivate APICv when using in-kernel PIT re-injection mode.Suravee Suthikulpanit2020-02-051-2/+9
| | | | | | | | | | | | | | | | AMD SVM AVIC accelerates EOI write and does not trap. This causes in-kernel PIT re-injection mode to fail since it relies on irq-ack notifier mechanism. So, APICv is activated only when in-kernel PIT is in discard mode e.g. w/ qemu option: -global kvm-pit.lost_tick_policy=discard Also, introduce APICV_INHIBIT_REASON_PIT_REINJ bit to be used for this reason. Suggested-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* svm: Temporarily deactivate AVIC during ExtINT handlingSuravee Suthikulpanit2020-02-051-4/+29
| | | | | | | | | | | | AMD AVIC does not support ExtINT. Therefore, AVIC must be temporary deactivated and fall back to using legacy interrupt injection via vINTR and interrupt window. Also, introduce APICV_INHIBIT_REASON_IRQWIN to be used for this reason. Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> [Rename svm_request_update_avic to svm_toggle_avic_for_extint. - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* svm: Deactivate AVIC when launching guest with nested SVM supportSuravee Suthikulpanit2020-02-051-1/+10
| | | | | | | | | | | | Since AVIC does not currently work w/ nested virtualization, deactivate AVIC for the guest if setting CPUID Fn80000001_ECX[SVM] (i.e. indicate support for SVM, which is needed for nested virtualization). Also, introduce a new APICV_INHIBIT_REASON_NESTED bit to be used for this reason. Suggested-by: Alexander Graf <graf@amazon.com> Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* kvm: x86: hyperv: Use APICv update request interfaceSuravee Suthikulpanit2020-02-051-1/+2
| | | | | | | | | | | | Since disabling APICv has to be done for all vcpus on AMD-based system, adopt the newly introduced kvm_request_apicv_update() interface, and introduce a new APICV_INHIBIT_REASON_HYPERV. Also, remove the kvm_vcpu_deactivate_apicv() since no longer used. Cc: Roman Kagan <rkagan@virtuozzo.com> Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* svm: Add support for dynamic APICvSuravee Suthikulpanit2020-02-051-10/+28
| | | | | | | Add necessary logics to support (de)activate AVIC at runtime. Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* kvm: x86: Introduce x86 ops hook for pre-update APICvSuravee Suthikulpanit2020-02-051-0/+6
| | | | | | | | | AMD SVM AVIC needs to update APIC backing page mapping before changing APICv mode. Introduce struct kvm_x86_ops.pre_update_apicv_exec_ctrl function hook to be called prior KVM APICv update request to each vcpu. Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* kvm: x86: Introduce APICv x86 ops for checking APIC inhibit reasonsSuravee Suthikulpanit2020-02-051-0/+8
| | | | | | | | Inibit reason bits are used to determine if APICv deactivation is applicable for a particular hardware virtualization architecture. Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* KVM: svm: avic: Add support for dynamic setup/teardown of virtual APIC ↵Suravee Suthikulpanit2020-02-051-6/+5
| | | | | | | | | | | backing page Re-factor avic_init_access_page() to avic_update_access_page() since activate/deactivate AVIC requires setting/unsetting the memory region used for virtual APIC backing page (APIC_ACCESS_PAGE_PRIVATE_MEMSLOT). Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* kvm: x86: svm: Add support to (de)activate posted interruptsSuravee Suthikulpanit2020-02-051-1/+36
| | | | | | | | Introduce interface for (de)activate posted interrupts, and implement SVM hooks to toggle AMD IOMMU guest virtual APIC mode. Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* KVM: x86: remove get_enable_apicv from kvm_x86_opsPaolo Bonzini2020-02-051-6/+0
| | | | | | It is unused now. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* kvm: x86: Introduce APICv inhibit reason bitsSuravee Suthikulpanit2020-02-051-1/+13
| | | | | | | | | | | | | | | | | | There are several reasons in which a VM needs to deactivate APICv e.g. disable APICv via parameter during module loading, or when enable Hyper-V SynIC support. Additional inhibit reasons will be introduced later on when dynamic APICv is supported, Introduce KVM APICv inhibit reason bits along with a new variable, apicv_inhibit_reasons, to help keep track of APICv state for each VM, Initially, the APICV_INHIBIT_REASON_DISABLE bit is used to indicate the case where APICv is disabled during KVM module load. (e.g. insmod kvm_amd avic=0 or insmod kvm_intel enable_apicv=0). Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> [Do not use get_enable_apicv; consider irqchip_split in svm.c. - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* kvm/svm: PKU not currently supportedJohn Allen2020-01-271-0/+6
| | | | | | | | | | | | | Current SVM implementation does not have support for handling PKU. Guests running on a host with future AMD cpus that support the feature will read garbage from the PKRU register and will hit segmentation faults on boot as memory is getting marked as protected that should not be. Ensure that cpuid from SVM does not advertise the feature. Signed-off-by: John Allen <john.allen@amd.com> Cc: stable@vger.kernel.org Fixes: 0556cbdc2fbc ("x86/pkeys: Don't check if PKRU is zero before writing it") Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* KVM: x86: Move kvm_vcpu_init() invocation to common codeSean Christopherson2020-01-241-10/+3
| | | | | | | | | | | | | | Move the kvm_cpu_{un}init() calls to common x86 code as an intermediate step to removing kvm_cpu_{un}init() altogether. Note, VMX'x alloc_apic_access_page() and init_rmode_identity_map() are per-VM allocations and are intentionally kept if vCPU creation fails. They are freed by kvm_arch_destroy_vm(). No functional change intended. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* KVM: x86: Move FPU allocation to common x86 codeSean Christopherson2020-01-241-24/+1
| | | | | | | | | | | | | | | | The allocation of FPU structs is identical across VMX and SVM, move it to common x86 code. Somewhat arbitrarily place the allocation so that it resides directly above the associated initialization via fx_init(), e.g. instead of retaining its position with respect to the overall vcpu creation flow. Although the names names kvm_arch_vcpu_create() and kvm_arch_vcpu_init() might suggest otherwise, x86 does not have a clean split between 'create' and 'init'. Allocating the struct immediately prior to the first use arguably improves readability *now*, and will yield even bigger improvements when kvm_arch_vcpu_init() is removed in a future patch. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* KVM: x86: Allocate vcpu struct in common x86 codeSean Christopherson2020-01-241-19/+9
| | | | | | | | | | | | | | | Move allocation of VMX and SVM vcpus to common x86. Although the struct being allocated is technically a VMX/SVM struct, it can be interpreted directly as a 'struct kvm_vcpu' because of the pre-existing requirement that 'struct kvm_vcpu' be located at offset zero of the arch/vendor vcpu struct. Remove the message from the build-time assertions regarding placement of the struct, as compatibility with the arch usercopy region is no longer the sole dependent on 'struct kvm_vcpu' being at offset zero. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* KVM: SVM: Use direct vcpu pointer during vCPU create/freeSean Christopherson2020-01-241-14/+16
| | | | | | | | | | Capture the vcpu pointer in a local varaible and replace '&svm->vcpu' references with a direct reference to the pointer in anticipation of moving bits of the code to common x86 and passing the vcpu pointer into svm_create_vcpu(), i.e. eliminate unnecessary noise from future patches. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* KVM: x86: avoid incorrect writes to host MSR_IA32_SPEC_CTRLPaolo Bonzini2020-01-241-6/+3
| | | | | | | | | | | | | | | | | | If the guest is configured to have SPEC_CTRL but the host does not (which is a nonsensical configuration but these are not explicitly forbidden) then a host-initiated MSR write can write vmx->spec_ctrl (respectively svm->spec_ctrl) and trigger a #GP when KVM tries to restore the host value of the MSR. Add a more comprehensive check for valid bits of SPEC_CTRL, covering host CPUID flags and, since we are at it and it is more correct that way, guest CPUID flags too. For AMD, remove the unnecessary is_guest_mode check around setting the MSR interception bitmap, so that the code looks the same as for Intel. Cc: Jim Mattson <jmattson@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* KVM: SVM: Override default MMIO mask if memory encryption is enabledTom Lendacky2020-01-211-0/+43
| | | | | | | | | | | | | | | | | | The KVM MMIO support uses bit 51 as the reserved bit to cause nested page faults when a guest performs MMIO. The AMD memory encryption support uses a CPUID function to define the encryption bit position. Given this, it is possible that these bits can conflict. Use svm_hardware_setup() to override the MMIO mask if memory encryption support is enabled. Various checks are performed to ensure that the mask is properly defined and rsvd_bits() is used to generate the new mask (as was done prior to the change that necessitated this patch). Fixes: 28a1f3ac1d0c ("kvm: x86: Set highest physical address bits in non-present/reserved SPTEs") Suggested-by: Sean Christopherson <sean.j.christopherson@intel.com> Reviewed-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* KVM: x86: Refactor and rename bit() to feature_bit() macroSean Christopherson2020-01-211-2/+2
| | | | | | | | | | | | Rename bit() to __feature_bit() to give it a more descriptive name, and add a macro, feature_bit(), to stuff the X68_FEATURE_ prefix to keep line lengths manageable for code that hardcodes the bit to be retrieved. No functional change intended. Cc: Jim Mattson <jmattson@google.com> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* KVM: x86: Drop special XSAVE handling from guest_cpuid_has()Sean Christopherson2020-01-211-0/+1
| | | | | | | | | | | | | | | | | Now that KVM prevents setting host-reserved CR4 bits, drop the dedicated XSAVE check in guest_cpuid_has() in favor of open coding similar checks in the SVM/VMX XSAVES enabling flows. Note, checking boot_cpu_has(X86_FEATURE_XSAVE) in the XSAVES flows is technically redundant with respect to the CR4 reserved bit checks, e.g. XSAVES #UDs if CR4.OSXSAVE=0 and arch.xsaves_enabled is consumed if and only if CR4.OXSAVE=1 in guest. Keep (add?) the explicit boot_cpu_has() checks to help document KVM's usage of arch.xsaves_enabled. No functional change intended. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* KVM: VMX: FIXED+PHYSICAL mode single target IPI fastpathWanpeng Li2020-01-211-4/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | ICR and TSCDEADLINE MSRs write cause the main MSRs write vmexits in our product observation, multicast IPIs are not as common as unicast IPI like RESCHEDULE_VECTOR and CALL_FUNCTION_SINGLE_VECTOR etc. This patch introduce a mechanism to handle certain performance-critical WRMSRs in a very early stage of KVM VMExit handler. This mechanism is specifically used for accelerating writes to x2APIC ICR that attempt to send a virtual IPI with physical destination-mode, fixed delivery-mode and single target. Which was found as one of the main causes of VMExits for Linux workloads. The reason this mechanism significantly reduce the latency of such virtual IPIs is by sending the physical IPI to the target vCPU in a very early stage of KVM VMExit handler, before host interrupts are enabled and before expensive operations such as reacquiring KVM’s SRCU lock. Latency is reduced even more when KVM is able to use APICv posted-interrupt mechanism (which allows to deliver the virtual IPI directly to target vCPU without the need to kick it to host). Testing on Xeon Skylake server: The virtual IPI latency from sender send to receiver receive reduces more than 200+ cpu cycles. Reviewed-by: Liran Alon <liran.alon@oracle.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Radim Krčmář <rkrcmar@redhat.com> Cc: Sean Christopherson <sean.j.christopherson@intel.com> Cc: Vitaly Kuznetsov <vkuznets@redhat.com> Cc: Liran Alon <liran.alon@oracle.com> Signed-off-by: Wanpeng Li <wanpengli@tencent.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* KVM: X86: Drop KVM_APIC_SHORT_MASK and KVM_APIC_DEST_MASKPeter Xu2020-01-081-2/+2
| | | | | | | | | | | | | | We have both APIC_SHORT_MASK and KVM_APIC_SHORT_MASK defined for the shorthand mask. Similarly, we have both APIC_DEST_MASK and KVM_APIC_DEST_MASK defined for the destination mode mask. Drop the KVM_APIC_* macros and replace the only user of them to use the APIC_DEST_* macros instead. At the meantime, move APIC_SHORT_MASK and APIC_DEST_MASK from lapic.c to lapic.h. Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* KVM x86: Move kvm cpuid support out of svmPeter Gonda2019-11-271-7/+0
| | | | | | | | | | | | Memory encryption support does not have module parameter dependencies and can be moved into the general x86 cpuid __do_cpuid_ent function. This changes maintains current behavior of passing through all of CPUID.8000001F. Suggested-by: Jim Mattson <jmattson@google.com> Signed-off-by: Peter Gonda <pgonda@google.com> Reviewed-by: Jim Mattson <jmattson@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* Merge branch 'kvm-tsx-ctrl' into HEADPaolo Bonzini2019-11-211-5/+11
|\ | | | | | | | | Conflicts: arch/x86/kvm/vmx/vmx.c
| * KVM: vmx, svm: always run with EFER.NXE=1 when shadow paging is activePaolo Bonzini2019-10-311-2/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | VMX already does so if the host has SMEP, in order to support the combination of CR0.WP=1 and CR4.SMEP=1. However, it is perfectly safe to always do so, and in fact VMX already ends up running with EFER.NXE=1 on old processors that lack the "load EFER" controls, because it may help avoiding a slow MSR write. Removing all the conditionals simplifies the code. SVM does not have similar code, but it should since recent AMD processors do support SMEP. So this patch also makes the code for the two vendors more similar while fixing NPT=0, CR0.WP=1 and CR4.SMEP=1 on AMD processors. Cc: stable@vger.kernel.org Cc: Joerg Roedel <jroedel@suse.de> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
| * KVM: SVM: Fix potential wrong physical id in avic_handle_ldr_updateMiaohe Lin2019-10-221-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | Guest physical APIC ID may not equal to vcpu->vcpu_id in some case. We may set the wrong physical id in avic_handle_ldr_update as we always use vcpu->vcpu_id. Get physical APIC ID from vAPIC page instead. Export and use kvm_xapic_id here and in avic_handle_apic_id_update as suggested by Vitaly. Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* | KVM: SVM: Remove check if APICv enabled in SVM update_cr8_intercept() handlerLiran Alon2019-11-151-2/+1
| | | | | | | | | | | | | | | | | | | | This check is unnecessary as x86 update_cr8_intercept() which calls this VMX/SVM specific callback already performs this check. Reviewed-by: Joao Martins <joao.m.martins@oracle.com> Signed-off-by: Liran Alon <liran.alon@oracle.com> Reviewed-by: Jim Mattson <jmattson@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* | KVM: retpolines: x86: eliminate retpoline from svm.c exit handlersAndrea Arcangeli2019-11-151-0/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | It's enough to check the exit value and issue a direct call to avoid the retpoline for all the common vmexit reasons. After this commit is applied, here the most common retpolines executed under a high resolution timer workload in the guest on a SVM host: [..] @[ trace_retpoline+1 __trace_retpoline+30 __x86_indirect_thunk_rax+33 ktime_get_update_offsets_now+70 hrtimer_interrupt+131 smp_apic_timer_interrupt+106 apic_timer_interrupt+15 start_sw_timer+359 restart_apic_timer+85 kvm_set_msr_common+1497 msr_interception+142 vcpu_enter_guest+684 kvm_arch_vcpu_ioctl_run+261 kvm_vcpu_ioctl+559 do_vfs_ioctl+164 ksys_ioctl+96 __x64_sys_ioctl+22 do_syscall_64+89 entry_SYSCALL_64_after_hwframe+68 ]: 1940 @[ trace_retpoline+1 __trace_retpoline+30 __x86_indirect_thunk_r12+33 force_qs_rnp+217 rcu_gp_kthread+1270 kthread+268 ret_from_fork+34 ]: 4644 @[]: 25095 @[ trace_retpoline+1 __trace_retpoline+30 __x86_indirect_thunk_rax+33 lapic_next_event+28 clockevents_program_event+148 hrtimer_start_range_ns+528 start_sw_timer+356 restart_apic_timer+85 kvm_set_msr_common+1497 msr_interception+142 vcpu_enter_guest+684 kvm_arch_vcpu_ioctl_run+261 kvm_vcpu_ioctl+559 do_vfs_ioctl+164 ksys_ioctl+96 __x64_sys_ioctl+22 do_syscall_64+89 entry_SYSCALL_64_after_hwframe+68 ]: 41474 @[ trace_retpoline+1 __trace_retpoline+30 __x86_indirect_thunk_rax+33 clockevents_program_event+148 hrtimer_start_range_ns+528 start_sw_timer+356 restart_apic_timer+85 kvm_set_msr_common+1497 msr_interception+142 vcpu_enter_guest+684 kvm_arch_vcpu_ioctl_run+261 kvm_vcpu_ioctl+559 do_vfs_ioctl+164 ksys_ioctl+96 __x64_sys_ioctl+22 do_syscall_64+89 entry_SYSCALL_64_after_hwframe+68 ]: 41474 @[ trace_retpoline+1 __trace_retpoline+30 __x86_indirect_thunk_rax+33 ktime_get+58 clockevents_program_event+84 hrtimer_start_range_ns+528 start_sw_timer+356 restart_apic_timer+85 kvm_set_msr_common+1497 msr_interception+142 vcpu_enter_guest+684 kvm_arch_vcpu_ioctl_run+261 kvm_vcpu_ioctl+559 do_vfs_ioctl+164 ksys_ioctl+96 __x64_sys_ioctl+22 do_syscall_64+89 entry_SYSCALL_64_after_hwframe+68 ]: 41887 @[ trace_retpoline+1 __trace_retpoline+30 __x86_indirect_thunk_rax+33 lapic_next_event+28 clockevents_program_event+148 hrtimer_try_to_cancel+168 hrtimer_cancel+21 kvm_set_lapic_tscdeadline_msr+43 kvm_set_msr_common+1497 msr_interception+142 vcpu_enter_guest+684 kvm_arch_vcpu_ioctl_run+261 kvm_vcpu_ioctl+559 do_vfs_ioctl+164 ksys_ioctl+96 __x64_sys_ioctl+22 do_syscall_64+89 entry_SYSCALL_64_after_hwframe+68 ]: 42723 @[ trace_retpoline+1 __trace_retpoline+30 __x86_indirect_thunk_rax+33 clockevents_program_event+148 hrtimer_try_to_cancel+168 hrtimer_cancel+21 kvm_set_lapic_tscdeadline_msr+43 kvm_set_msr_common+1497 msr_interception+142 vcpu_enter_guest+684 kvm_arch_vcpu_ioctl_run+261 kvm_vcpu_ioctl+559 do_vfs_ioctl+164 ksys_ioctl+96 __x64_sys_ioctl+22 do_syscall_64+89 entry_SYSCALL_64_after_hwframe+68 ]: 42766 @[ trace_retpoline+1 __trace_retpoline+30 __x86_indirect_thunk_rax+33 ktime_get+58 clockevents_program_event+84 hrtimer_try_to_cancel+168 hrtimer_cancel+21 kvm_set_lapic_tscdeadline_msr+43 kvm_set_msr_common+1497 msr_interception+142 vcpu_enter_guest+684 kvm_arch_vcpu_ioctl_run+261 kvm_vcpu_ioctl+559 do_vfs_ioctl+164 ksys_ioctl+96 __x64_sys_ioctl+22 do_syscall_64+89 entry_SYSCALL_64_after_hwframe+68 ]: 42848 @[ trace_retpoline+1 __trace_retpoline+30 __x86_indirect_thunk_rax+33 ktime_get+58 start_sw_timer+279 restart_apic_timer+85 kvm_set_msr_common+1497 msr_interception+142 vcpu_enter_guest+684 kvm_arch_vcpu_ioctl_run+261 kvm_vcpu_ioctl+559 do_vfs_ioctl+164 ksys_ioctl+96 __x64_sys_ioctl+22 do_syscall_64+89 entry_SYSCALL_64_after_hwframe+68 ]: 499845 @total: 1780243 SVM has no TSC based programmable preemption timer so it is invoking ktime_get() frequently. Signed-off-by: Andrea Arcangeli <aarcange@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* | kvm: svm: Update svm_xsaves_supportedAaron Lewis2019-10-221-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | AMD CPUs now support XSAVES in a limited fashion (they require IA32_XSS to be zero). AMD has no equivalent of Intel's "Enable XSAVES/XRSTORS" VM-execution control. Instead, XSAVES is always available to the guest when supported on the host. Reviewed-by: Jim Mattson <jmattson@google.com> Signed-off-by: Aaron Lewis <aaronlewis@google.com> Change-Id: I40dc2c682eb0d38c2208d95d5eb7bbb6c47f6317 Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* | KVM: x86: Move IA32_XSS-swapping on VM-entry/VM-exit to common x86 codeAaron Lewis2019-10-221-25/+2
| | | | | | | | | | | | | | | | | | | | Hoist the vendor-specific code related to loading the hardware IA32_XSS MSR with guest/host values on VM-entry/VM-exit to common x86 code. Reviewed-by: Jim Mattson <jmattson@google.com> Signed-off-by: Aaron Lewis <aaronlewis@google.com> Change-Id: Ic6e3430833955b98eb9b79ae6715cf2a3fdd6d82 Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* | KVM: SVM: Use wrmsr for switching between guest and host IA32_XSS on AMDAaron Lewis2019-10-221-0/+23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | When the guest can execute the XSAVES/XRSTORS instructions, set the hardware IA32_XSS MSR to guest/host values on VM-entry/VM-exit. Note that vcpu->arch.ia32_xss is currently guaranteed to be 0 on AMD, since there is no way to change it. Suggested-by: Sean Christopherson <sean.j.christopherson@intel.com> Reviewed-by: Jim Mattson <jmattson@google.com> Signed-off-by: Aaron Lewis <aaronlewis@google.com> Change-Id: Id51a782462086e6d7a3ab621838e200f1c005afd Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* | KVM: x86: Introduce vcpu->arch.xsaves_enabledAaron Lewis2019-10-221-0/+3
| | | | | | | | | | | | | | | | | | | | Cache whether XSAVES is enabled in the guest by adding xsaves_enabled to vcpu->arch. Reviewed-by: Jim Mattson <jmattson@google.com> Signed-off-by: Aaron Lewis <aaronlewis@google.com> Change-Id: If4638e0901c28a4494dad2e103e2c075e8ab5d68 Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* | kvm: x86: Modify kvm_x86_ops.get_enable_apicv() to use struct kvm parameterSuthikulpanit, Suravee2019-10-221-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | Generally, APICv for all vcpus in the VM are enable/disable in the same manner. So, get_enable_apicv() should represent APICv status of the VM instead of each VCPU. Modify kvm_x86_ops.get_enable_apicv() to take struct kvm as parameter instead of struct kvm_vcpu. Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* | KVM: x86: Fold decache_cr3() into cache_reg()Sean Christopherson2019-10-221-6/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Handle caching CR3 (from VMX's VMCS) into struct kvm_vcpu via the common cache_reg() callback and drop the dedicated decache_cr3(). The name decache_cr3() is somewhat confusing as the caching behavior of CR3 follows that of GPRs, RFLAGS and PDPTRs, (handled via cache_reg()), and has nothing in common with the caching behavior of CR0/CR4 (whose decache_cr{0,4}_guest_bits() likely provided the 'decache' verbiage). This would effectivel adds a BUG() if KVM attempts to cache CR3 on SVM. Change it to a WARN_ON_ONCE() -- if the cache never requires filling, the value is already in the right place -- and opportunistically add one in VMX to provide an equivalent check. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* | KVM: SVM: Reduce WBINVD/DF_FLUSH invocationsTom Lendacky2019-10-221-15/+66
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Performing a WBINVD and DF_FLUSH are expensive operations. Currently, a WBINVD/DF_FLUSH is performed every time an SEV guest terminates. However, the WBINVD/DF_FLUSH is only required when an ASID is being re-allocated to a new SEV guest. Also, a single WBINVD/DF_FLUSH can enable all ASIDs that have been disassociated from guests through DEACTIVATE. To reduce the number of WBINVD/DF_FLUSH invocations, introduce a new ASID bitmap to track ASIDs that need to be reclaimed. When an SEV guest is terminated, add its ASID to the reclaim bitmap instead of clearing the bitmap in the existing SEV ASID bitmap. This delays the need to perform a WBINVD/DF_FLUSH invocation when an SEV guest terminates until all of the available SEV ASIDs have been used. At that point, the WBINVD/DF_FLUSH invocation can be performed and all ASIDs in the reclaim bitmap moved to the available ASIDs bitmap. The semaphore around DEACTIVATE can be changed to a read semaphore with the semaphore taken in write mode before performing the WBINVD/DF_FLUSH. Tested-by: David Rientjes <rientjes@google.com> Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* | KVM: SVM: Remove unneeded WBINVD and DF_FLUSH when starting SEV guestsTom Lendacky2019-10-221-15/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Performing a WBINVD and DF_FLUSH are expensive operations. The SEV support currently performs this WBINVD/DF_FLUSH combination when an SEV guest is terminated, so there is no need for it to be done before LAUNCH. However, when the SEV firmware transitions the platform from UNINIT state to INIT state, all ASIDs will be marked invalid across all threads. Therefore, as part of transitioning the platform to INIT state, perform a WBINVD/DF_FLUSH after a successful INIT in the PSP/SEV device driver. Since the PSP/SEV device driver is x86 only, it can reference and use the WBINVD related functions directly. Cc: Gary Hook <gary.hook@amd.com> Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: "David S. Miller" <davem@davemloft.net> Tested-by: David Rientjes <rientjes@google.com> Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* | KVM: SVM: Guard against DEACTIVATE when performing WBINVD/DF_FLUSHTom Lendacky2019-10-221-0/+20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The SEV firmware DEACTIVATE command disassociates an SEV guest from an ASID, clears the WBINVD indicator on all threads and indicates that the SEV firmware DF_FLUSH command must be issued before the ASID can be re-used. The SEV firmware DF_FLUSH command will return an error if a WBINVD has not been performed on every thread before it has been invoked. A window exists between the WBINVD and the invocation of the DF_FLUSH command where an SEV firmware DEACTIVATE command could be invoked on another thread, clearing the WBINVD indicator. This will cause the subsequent SEV firmware DF_FLUSH command to fail which, in turn, results in the SEV firmware ACTIVATE command failing for the reclaimed ASID. This results in the SEV guest failing to start. Use a mutex to close the WBINVD/DF_FLUSH window by obtaining the mutex before the DEACTIVATE and releasing it after the DF_FLUSH. This ensures that any DEACTIVATE cannot run before a DF_FLUSH has completed. Fixes: 59414c989220 ("KVM: SVM: Add support for KVM_SEV_LAUNCH_START command") Tested-by: David Rientjes <rientjes@google.com> Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>