summaryrefslogtreecommitdiffstats
path: root/Documentation/virt
Commit message (Collapse)AuthorAgeFilesLines
* KVM: X86: Introduce KVM_HC_MAP_GPA_RANGE hypercallAshish Kalra2021-06-174-0/+60
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This hypercall is used by the SEV guest to notify a change in the page encryption status to the hypervisor. The hypercall should be invoked only when the encryption attribute is changed from encrypted -> decrypted and vice versa. By default all guest pages are considered encrypted. The hypercall exits to userspace to manage the guest shared regions and integrate with the userspace VMM's migration code. Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Joerg Roedel <joro@8bytes.org> Cc: Borislav Petkov <bp@suse.de> Cc: Tom Lendacky <thomas.lendacky@amd.com> Cc: x86@kernel.org Cc: kvm@vger.kernel.org Cc: linux-kernel@vger.kernel.org Reviewed-by: Steve Rutherford <srutherford@google.com> Signed-off-by: Brijesh Singh <brijesh.singh@amd.com> Signed-off-by: Ashish Kalra <ashish.kalra@amd.com> Co-developed-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Co-developed-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <90778988e1ee01926ff9cac447aacb745f954c8c.1623174621.git.ashish.kalra@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* KVM: x86: Introduce KVM_GET_SREGS2 / KVM_SET_SREGS2Maxim Levitsky2021-06-171-0/+48
| | | | | | | | | | | | | | | | This is a new version of KVM_GET_SREGS / KVM_SET_SREGS. It has the following changes: * Has flags for future extensions * Has vcpu's PDPTRs, allowing to save/restore them on migration. * Lacks obsolete interrupt bitmap (done now via KVM_SET_VCPU_EVENTS) New capability, KVM_CAP_SREGS2 is added to signal the userspace of this ioctl. Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20210607090203.133058-8-mlevitsk@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* KVM: x86: hyper-v: Introduce KVM_CAP_HYPERV_ENFORCE_CPUIDVitaly Kuznetsov2021-06-171-0/+11
| | | | | | | | | | | Modeled after KVM_CAP_ENFORCE_PV_FEATURE_CPUID, the new capability allows for limiting Hyper-V features to those exposed to the guest in Hyper-V CPUIDs (0x40000003, 0x40000004, ...). Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20210521095204.2161214-3-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* KVM: mmu: Add slots_arch_lock for memslot arch fieldsBen Gardon2021-06-171-0/+5
| | | | | | | | | | | Add a new lock to protect the arch-specific fields of memslots if they need to be modified in a kvm->srcu read critical section. A future commit will use this lock to lazily allocate memslot rmaps for x86. Signed-off-by: Ben Gardon <bgardon@google.com> Message-Id: <20210518173414.450044-5-bgardon@google.com> [Add Documentation/ hunk. - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* KVM: X86: MMU: Use the correct inherited permissions to get shadow pageLai Jiangshan2021-06-081-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When computing the access permissions of a shadow page, use the effective permissions of the walk up to that point, i.e. the logic AND of its parents' permissions. Two guest PxE entries that point at the same table gfn need to be shadowed with different shadow pages if their parents' permissions are different. KVM currently uses the effective permissions of the last non-leaf entry for all non-leaf entries. Because all non-leaf SPTEs have full ("uwx") permissions, and the effective permissions are recorded only in role.access and merged into the leaves, this can lead to incorrect reuse of a shadow page and eventually to a missing guest protection page fault. For example, here is a shared pagetable: pgd[] pud[] pmd[] virtual address pointers /->pmd1(u--)->pte1(uw-)->page1 <- ptr1 (u--) /->pud1(uw-)--->pmd2(uw-)->pte2(uw-)->page2 <- ptr2 (uw-) pgd-| (shared pmd[] as above) \->pud2(u--)--->pmd1(u--)->pte1(uw-)->page1 <- ptr3 (u--) \->pmd2(uw-)->pte2(uw-)->page2 <- ptr4 (u--) pud1 and pud2 point to the same pmd table, so: - ptr1 and ptr3 points to the same page. - ptr2 and ptr4 points to the same page. (pud1 and pud2 here are pud entries, while pmd1 and pmd2 here are pmd entries) - First, the guest reads from ptr1 first and KVM prepares a shadow page table with role.access=u--, from ptr1's pud1 and ptr1's pmd1. "u--" comes from the effective permissions of pgd, pud1 and pmd1, which are stored in pt->access. "u--" is used also to get the pagetable for pud1, instead of "uw-". - Then the guest writes to ptr2 and KVM reuses pud1 which is present. The hypervisor set up a shadow page for ptr2 with pt->access is "uw-" even though the pud1 pmd (because of the incorrect argument to kvm_mmu_get_page in the previous step) has role.access="u--". - Then the guest reads from ptr3. The hypervisor reuses pud1's shadow pmd for pud2, because both use "u--" for their permissions. Thus, the shadow pmd already includes entries for both pmd1 and pmd2. - At last, the guest writes to ptr4. This causes no vmexit or pagefault, because pud1's shadow page structures included an "uw-" page even though its role.access was "u--". Any kind of shared pagetable might have the similar problem when in virtual machine without TDP enabled if the permissions are different from different ancestors. In order to fix the problem, we change pt->access to be an array, and any access in it will not include permissions ANDed from child ptes. The test code is: https://lore.kernel.org/kvm/20210603050537.19605-1-jiangshanlai@gmail.com/ Remember to test it with TDP disabled. The problem had existed long before the commit 41074d07c78b ("KVM: MMU: Fix inherited permissions for emulated guest pte updates"), and it is hard to find which is the culprit. So there is no fixes tag here. Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com> Message-Id: <20210603052455.21023-1-jiangshanlai@gmail.com> Cc: stable@vger.kernel.org Fixes: cea0f0e7ea54 ("[PATCH] KVM: MMU: Shadow page table caching") Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* KVM: rename KVM_REQ_PENDING_TIMER to KVM_REQ_UNBLOCKMarcelo Tosatti2021-05-271-3/+5
| | | | | | | | | | | | | KVM_REQ_UNBLOCK will be used to exit a vcpu from its inner vcpu halt emulation loop. Rename KVM_REQ_PENDING_TIMER to KVM_REQ_UNBLOCK, switch PowerPC to arch specific request bit. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Message-Id: <20210525134321.303768132@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* doc/kvm: Fix wrong entry for KVM_CAP_X86_MSR_FILTERSiddharth Chandrasekaran2021-05-071-2/+2
| | | | | | | | | | | | The capability that exposes new ioctl KVM_X86_SET_MSR_FILTER to userspace is specified incorrectly as the ioctl itself (instead of KVM_CAP_X86_MSR_FILTER). This patch fixes it. Fixes: 1a155254ff93 ("KVM: x86: Introduce MSR filtering") Reviewed-by: Alexander Graf <graf@amazon.de> Signed-off-by: Siddharth Chandrasekaran <sidcha@amazon.de> Message-Id: <20210503120059.9283-1-sidcha@amazon.de> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds2021-05-018-52/+417
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull kvm updates from Paolo Bonzini: "This is a large update by KVM standards, including AMD PSP (Platform Security Processor, aka "AMD Secure Technology") and ARM CoreSight (debug and trace) changes. ARM: - CoreSight: Add support for ETE and TRBE - Stage-2 isolation for the host kernel when running in protected mode - Guest SVE support when running in nVHE mode - Force W^X hypervisor mappings in nVHE mode - ITS save/restore for guests using direct injection with GICv4.1 - nVHE panics now produce readable backtraces - Guest support for PTP using the ptp_kvm driver - Performance improvements in the S2 fault handler x86: - AMD PSP driver changes - Optimizations and cleanup of nested SVM code - AMD: Support for virtual SPEC_CTRL - Optimizations of the new MMU code: fast invalidation, zap under read lock, enable/disably dirty page logging under read lock - /dev/kvm API for AMD SEV live migration (guest API coming soon) - support SEV virtual machines sharing the same encryption context - support SGX in virtual machines - add a few more statistics - improved directed yield heuristics - Lots and lots of cleanups Generic: - Rework of MMU notifier interface, simplifying and optimizing the architecture-specific code - a handful of "Get rid of oprofile leftovers" patches - Some selftests improvements" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (379 commits) KVM: selftests: Speed up set_memory_region_test selftests: kvm: Fix the check of return value KVM: x86: Take advantage of kvm_arch_dy_has_pending_interrupt() KVM: SVM: Skip SEV cache flush if no ASIDs have been used KVM: SVM: Remove an unnecessary prototype declaration of sev_flush_asids() KVM: SVM: Drop redundant svm_sev_enabled() helper KVM: SVM: Move SEV VMCB tracking allocation to sev.c KVM: SVM: Explicitly check max SEV ASID during sev_hardware_setup() KVM: SVM: Unconditionally invoke sev_hardware_teardown() KVM: SVM: Enable SEV/SEV-ES functionality by default (when supported) KVM: SVM: Condition sev_enabled and sev_es_enabled on CONFIG_KVM_AMD_SEV=y KVM: SVM: Append "_enabled" to module-scoped SEV/SEV-ES control variables KVM: SEV: Mask CPUID[0x8000001F].eax according to supported features KVM: SVM: Move SEV module params/variables to sev.c KVM: SVM: Disable SEV/SEV-ES if NPT is disabled KVM: SVM: Free sev_asid_bitmap during init if SEV setup fails KVM: SVM: Zero out the VMCB array used to track SEV ASID association x86/sev: Drop redundant and potentially misleading 'sev_enabled' KVM: x86: Move reverse CPUID helpers to separate header file KVM: x86: Rename GPR accessors to make mode-aware variants the defaults ...
| * KVM: documentation: fix sphinx warningsPaolo Bonzini2021-04-262-4/+5
| | | | | | | | Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
| * Merge tag 'kvmarm-5.13' of ↵Paolo Bonzini2021-04-235-3/+51
| |\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD KVM/arm64 updates for Linux 5.13 New features: - Stage-2 isolation for the host kernel when running in protected mode - Guest SVE support when running in nVHE mode - Force W^X hypervisor mappings in nVHE mode - ITS save/restore for guests using direct injection with GICv4.1 - nVHE panics now produce readable backtraces - Guest support for PTP using the ptp_kvm driver - Performance improvements in the S2 fault handler - Alexandru is now a reviewer (not really a new feature...) Fixes: - Proper emulation of the GICR_TYPER register - Handle the complete set of relocation in the nVHE EL2 object - Get rid of the oprofile dependency in the PMU code (and of the oprofile body parts at the same time) - Debug and SPE fixes - Fix vcpu reset
| | * Merge branch 'kvm-arm64/ptp' into kvmarm-master/nextMarc Zyngier2021-04-201-2/+2
| | |\ | | | | | | | | | | | | Signed-off-by: Marc Zyngier <maz@kernel.org>
| | | * KVM: arm64: Fix Function ID typo for PTP_KVM serviceZenghui Yu2021-04-201-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Per include/linux/arm-smccc.h, the Function ID of PTP_KVM service is defined as ARM_SMCCC_VENDOR_HYP_KVM_PTP_FUNC_ID. Fix the typo in documentation to keep the git grep consistent. Signed-off-by: Zenghui Yu <yuzenghui@huawei.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20210417113804.1992-1-yuzenghui@huawei.com
| | * | Merge branch 'kvm-arm64/vlpi-save-restore' into kvmarm-master/nextMarc Zyngier2021-04-131-1/+1
| | |\ \ | | | | | | | | | | | | | | | Signed-off-by: Marc Zyngier <maz@kernel.org>
| | | * | KVM: arm64: GICv4.1: Give a chance to save VLPI stateShenming Lu2021-03-241-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Before GICv4.1, we don't have direct access to the VLPI state. So we simply let it fail early when encountering any VLPI in saving. But now we don't have to return -EACCES directly if on GICv4.1. Let’s change the hard code and give a chance to save the VLPI state (and preserve the UAPI). Signed-off-by: Shenming Lu <lushenming@huawei.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20210322060158.1584-7-lushenming@huawei.com
| | * | | Merge branch 'kvm-arm64/vgic-5.13' into kvmarm-master/nextMarc Zyngier2021-04-131-1/+1
| | |\ \ \ | | | | | | | | | | | | | | | | | | Signed-off-by: Marc Zyngier <maz@kernel.org>
| | | * | | docs: kvm: devices/arm-vgic-v3: enhance KVM_DEV_ARM_VGIC_CTRL_INIT docEric Auger2021-04-061-1/+1
| | | |/ / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | kvm_arch_vcpu_precreate() returns -EBUSY if the vgic is already initialized. So let's document that KVM_DEV_ARM_VGIC_CTRL_INIT must be called after all vcpu creations. Signed-off-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20210405163941.510258-6-eric.auger@redhat.com
| | * | | Merge branch 'kvm-arm64/ptp' into kvmarm-master/nextMarc Zyngier2021-04-133-0/+36
| | |\ \ \ | | | | |/ | | | |/| | | | | | Signed-off-by: Marc Zyngier <maz@kernel.org>
| | | * | KVM: arm64: Fix table format for PTP documentationMarc Zyngier2021-04-081-11/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The documentation build legitimately screams about the PTP documentation table being misformated. Fix it by adjusting the table width guides. Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Marc Zyngier <maz@kernel.org>
| | | * | KVM: arm64: Add support for the KVM PTP serviceJianyong Wu2021-04-073-0/+36
| | | |/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Implement the hypervisor side of the KVM PTP interface. The service offers wall time and cycle count from host to guest. The caller must specify whether they want the host's view of either the virtual or physical counter. Signed-off-by: Jianyong Wu <jianyong.wu@arm.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20201209060932.212364-7-jianyong.wu@arm.com
| | * | Merge branch 'kvm-arm64/misc-5.13' into kvmarm-master/nextMarc Zyngier2021-04-131-0/+12
| | |\ \ | | | | | | | | | | | | | | | Signed-off-by: Marc Zyngier <maz@kernel.org>
| | | * | KVM: arm64: Clarify vcpu reset behaviourMarc Zyngier2021-04-091-0/+12
| | | |/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Although the KVM_ARM_VCPU_INIT documentation mention that the registers are reset to their "initial values", it doesn't describe what these values are. Describe this state explicitly. Reviewed-by: Alexandru Elisei <alexandru.elisei@arm.com> Acked-by: Will Deacon <will@kernel.org> Signed-off-by: Marc Zyngier <maz@kernel.org>
| | * / Documentation: KVM: Document KVM_GUESTDBG_USE_HW control flag for arm64Alexandru Elisei2021-04-071-1/+2
| | |/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit 21b6f32f9471 ("KVM: arm64: guest debug, define API headers") added the arm64 KVM_GUESTDBG_USE_HW flag for the KVM_SET_GUEST_DEBUG ioctl and commit 834bf88726f0 ("KVM: arm64: enable KVM_CAP_SET_GUEST_DEBUG") documented and implemented the flag functionality. Since its introduction, at no point was the flag known by any name other than KVM_GUESTDBG_USE_HW for the arm64 architecture, so refer to it as such in the documentation. CC: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20210407144857.199746-2-alexandru.elisei@arm.com
| * | Merge branch 'kvm-sev-cgroup' into HEADPaolo Bonzini2021-04-221-2/+105
| |\ \
| * | | KVM: SVM: Add KVM_SEV_RECEIVE_FINISH commandBrijesh Singh2021-04-211-0/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The command finalize the guest receiving process and make the SEV guest ready for the execution. Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Joerg Roedel <joro@8bytes.org> Cc: Borislav Petkov <bp@suse.de> Cc: Tom Lendacky <thomas.lendacky@amd.com> Cc: x86@kernel.org Cc: kvm@vger.kernel.org Cc: linux-kernel@vger.kernel.org Reviewed-by: Steve Rutherford <srutherford@google.com> Signed-off-by: Brijesh Singh <brijesh.singh@amd.com> Signed-off-by: Ashish Kalra <ashish.kalra@amd.com> Message-Id: <d08914dc259644de94e29b51c3b68a13286fc5a3.1618498113.git.ashish.kalra@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
| * | | KVM: SVM: Add KVM_SEV_RECEIVE_UPDATE_DATA commandBrijesh Singh2021-04-211-0/+24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The command is used for copying the incoming buffer into the SEV guest memory space. Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Joerg Roedel <joro@8bytes.org> Cc: Borislav Petkov <bp@suse.de> Cc: Tom Lendacky <thomas.lendacky@amd.com> Cc: x86@kernel.org Cc: kvm@vger.kernel.org Cc: linux-kernel@vger.kernel.org Reviewed-by: Steve Rutherford <srutherford@google.com> Signed-off-by: Brijesh Singh <brijesh.singh@amd.com> Signed-off-by: Ashish Kalra <ashish.kalra@amd.com> Message-Id: <c5d0e3e719db7bb37ea85d79ed4db52e9da06257.1618498113.git.ashish.kalra@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
| * | | KVM: SVM: Add support for KVM_SEV_RECEIVE_START commandBrijesh Singh2021-04-211-0/+29
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The command is used to create the encryption context for an incoming SEV guest. The encryption context can be later used by the hypervisor to import the incoming data into the SEV guest memory space. Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Joerg Roedel <joro@8bytes.org> Cc: Borislav Petkov <bp@suse.de> Cc: Tom Lendacky <thomas.lendacky@amd.com> Cc: x86@kernel.org Cc: kvm@vger.kernel.org Cc: linux-kernel@vger.kernel.org Reviewed-by: Steve Rutherford <srutherford@google.com> Signed-off-by: Brijesh Singh <brijesh.singh@amd.com> Signed-off-by: Ashish Kalra <ashish.kalra@amd.com> Message-Id: <c7400111ed7458eee01007c4d8d57cdf2cbb0fc2.1618498113.git.ashish.kalra@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
| * | | KVM: SVM: Add support for KVM_SEV_SEND_CANCEL commandSteve Rutherford2021-04-211-0/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | After completion of SEND_START, but before SEND_FINISH, the source VMM can issue the SEND_CANCEL command to stop a migration. This is necessary so that a cancelled migration can restart with a new target later. Reviewed-by: Nathan Tempelman <natet@google.com> Reviewed-by: Brijesh Singh <brijesh.singh@amd.com> Signed-off-by: Steve Rutherford <srutherford@google.com> Message-Id: <20210412194408.2458827-1-srutherford@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
| * | | KVM: SVM: Add KVM_SEV_SEND_FINISH commandBrijesh Singh2021-04-211-0/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The command is used to finailize the encryption context created with KVM_SEV_SEND_START command. Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Joerg Roedel <joro@8bytes.org> Cc: Borislav Petkov <bp@suse.de> Cc: Tom Lendacky <thomas.lendacky@amd.com> Cc: x86@kernel.org Cc: kvm@vger.kernel.org Cc: linux-kernel@vger.kernel.org Reviewed-by: Steve Rutherford <srutherford@google.com> Signed-off-by: Brijesh Singh <brijesh.singh@amd.com> Signed-off-by: Ashish Kalra <ashish.kalra@amd.com> Message-Id: <5082bd6a8539d24bc55a1dd63a1b341245bb168f.1618498113.git.ashish.kalra@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
| * | | KVM: SVM: Add KVM_SEND_UPDATE_DATA commandBrijesh Singh2021-04-211-0/+28
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The command is used for encrypting the guest memory region using the encryption context created with KVM_SEV_SEND_START. Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Joerg Roedel <joro@8bytes.org> Cc: Borislav Petkov <bp@suse.de> Cc: Tom Lendacky <thomas.lendacky@amd.com> Cc: x86@kernel.org Cc: kvm@vger.kernel.org Cc: linux-kernel@vger.kernel.org Reviewed-by : Steve Rutherford <srutherford@google.com> Signed-off-by: Brijesh Singh <brijesh.singh@amd.com> Signed-off-by: Ashish Kalra <ashish.kalra@amd.com> Message-Id: <d6a6ea740b0c668b30905ae31eac5ad7da048bb3.1618498113.git.ashish.kalra@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
| * | | KVM: SVM: Add KVM_SEV SEND_START commandBrijesh Singh2021-04-211-0/+30
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The command is used to create an outgoing SEV guest encryption context. Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Joerg Roedel <joro@8bytes.org> Cc: Borislav Petkov <bp@suse.de> Cc: Tom Lendacky <thomas.lendacky@amd.com> Cc: x86@kernel.org Cc: kvm@vger.kernel.org Cc: linux-kernel@vger.kernel.org Reviewed-by: Steve Rutherford <srutherford@google.com> Reviewed-by: Venu Busireddy <venu.busireddy@oracle.com> Signed-off-by: Brijesh Singh <brijesh.singh@amd.com> Signed-off-by: Ashish Kalra <ashish.kalra@amd.com> Message-Id: <2f1686d0164e0f1b3d6a41d620408393e0a48376.1618498113.git.ashish.kalra@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
| * | | KVM: x86: document behavior of measurement ioctls with len==0Paolo Bonzini2021-04-211-0/+6
| | | | | | | | | | | | | | | | Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
| * | | KVM: x86: Support KVM VMs sharing SEV contextNathan Tempelman2021-04-211-1/+17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add a capability for userspace to mirror SEV encryption context from one vm to another. On our side, this is intended to support a Migration Helper vCPU, but it can also be used generically to support other in-guest workloads scheduled by the host. The intention is for the primary guest and the mirror to have nearly identical memslots. The primary benefits of this are that: 1) The VMs do not share KVM contexts (think APIC/MSRs/etc), so they can't accidentally clobber each other. 2) The VMs can have different memory-views, which is necessary for post-copy migration (the migration vCPUs on the target need to read and write to pages, when the primary guest would VMEXIT). This does not change the threat model for AMD SEV. Any memory involved is still owned by the primary guest and its initial state is still attested to through the normal SEV_LAUNCH_* flows. If userspace wanted to circumvent SEV, they could achieve the same effect by simply attaching a vCPU to the primary VM. This patch deliberately leaves userspace in charge of the memslots for the mirror, as it already has the power to mess with them in the primary guest. This patch does not support SEV-ES (much less SNP), as it does not handle handing off attested VMSAs to the mirror. For additional context, we need a Migration Helper because SEV PSP migration is far too slow for our live migration on its own. Using an in-guest migrator lets us speed this up significantly. Signed-off-by: Nathan Tempelman <natet@google.com> Message-Id: <20210408223214.2582277-1-natet@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
| * | | KVM: x86: Add capability to grant VM access to privileged SGX attributeSean Christopherson2021-04-201-0/+23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add a capability, KVM_CAP_SGX_ATTRIBUTE, that can be used by userspace to grant a VM access to a priveleged attribute, with args[0] holding a file handle to a valid SGX attribute file. The SGX subsystem restricts access to a subset of enclave attributes to provide additional security for an uncompromised kernel, e.g. to prevent malware from using the PROVISIONKEY to ensure its nodes are running inside a geniune SGX enclave and/or to obtain a stable fingerprint. To prevent userspace from circumventing such restrictions by running an enclave in a VM, KVM restricts guest access to privileged attributes by default. Cc: Andy Lutomirski <luto@amacapital.net> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Kai Huang <kai.huang@intel.com> Message-Id: <0b099d65e933e068e3ea934b0523bab070cb8cea.1618196135.git.kai.huang@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
| * | | doc/virt/kvm: move KVM_CAP_PPC_MULTITCE in section 8Emanuele Giuseppe Esposito2021-04-201-125/+125
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | KVM_CAP_PPC_MULTITCE is a capability, not an ioctl. Therefore move it from section 4.97 to the new 8.31 (other capabilities). To fill the gap, move KVM_X86_SET_MSR_FILTER (was 4.126) to 4.97, and shifted Xen-related ioctl (were 4.127 - 4.130) by one place (4.126 - 4.129). Also fixed minor typo in KVM_GET_MSR_INDEX_LIST ioctl description (section 4.3). Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com> Message-Id: <20210316170814.64286-1-eesposit@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
| * | | KVM: introduce KVM_CAP_SET_GUEST_DEBUG2Paolo Bonzini2021-04-171-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This capability will allow the user to know which KVM_GUESTDBG_* bits are supported. Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20210401135451.1004564-3-mlevitsk@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
| * | | Merge tag 'kvm-s390-next-5.13-1' of ↵Paolo Bonzini2021-04-151-0/+33
| |\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux into HEAD KVM: s390: Updates for 5.13 - properly handle MVPG in nesting KVM (vsie) - allow to forward the yield_to hypercall (diagnose 9c) - fixes
| | * | | KVM: s390: diag9c (directed yield) forwardingPierre Morel2021-03-091-0/+33
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When we intercept a DIAG_9C from the guest we verify that the target real CPU associated with the virtual CPU designated by the guest is running and if not we forward the DIAG_9C to the target real CPU. To avoid a diag9c storm we allow a maximal rate of diag9c forwarding. The rate is calculated as a count per second defined as a new parameter of the s390 kvm module: diag9c_forwarding_hz . The default value of 0 is to not forward diag9c. Signed-off-by: Pierre Morel <pmorel@linux.ibm.com> Link: https://lore.kernel.org/r/1613997661-22525-2-git-send-email-pmorel@linux.ibm.com Reviewed-by: Cornelia Huck <cohuck@redhat.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
| * | | | KVM: x86/mmu: Make Host-writable and MMU-writable bit locations dynamicSean Christopherson2021-03-151-9/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Make the location of the HOST_WRITABLE and MMU_WRITABLE configurable for a given KVM instance. This will allow EPT to use high available bits, which in turn will free up bit 11 for a constant MMU_PRESENT bit. No functional change intended. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210225204749.1512652-19-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
| * | | | KVM: x86/mmu: Rename and document A/D scheme for TDP SPTEsSean Christopherson2021-03-151-19/+18
| | |_|/ | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Rename the various A/D status defines to explicitly associated them with TDP. There is a subtle dependency on the bits in question never being set when using PAE paging, as those bits are reserved, not available. I.e. using these bits outside of TDP (technically EPT) would cause explosions. No functional change intended. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210225204749.1512652-13-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* | | | Merge tag 'docs-5.13' of git://git.lwn.net/linuxLinus Torvalds2021-04-261-1/+1
|\ \ \ \ | |_|_|/ |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull documentation updates from Jonathan Corbet: "It's been a relatively busy cycle in docsland, though more than usually well contained to Documentation/ itself. Highlights include: - The Chinese translators have been busy and show no signs of stopping anytime soon. Italian has also caught up. - Aditya Srivastava has been working on improvements to the kernel-doc script. - Thorsten continues his work on reporting-issues.rst and related documentation around regression reporting. - Lots of documentation updates, typo fixes, etc. as usual" * tag 'docs-5.13' of git://git.lwn.net/linux: (139 commits) docs/zh_CN: add openrisc translation to zh_CN index docs/zh_CN: add openrisc index.rst translation docs/zh_CN: add openrisc todo.rst translation docs/zh_CN: add openrisc openrisc_port.rst translation docs/zh_CN: add core api translation to zh_CN index docs/zh_CN: add core-api index.rst translation docs/zh_CN: add core-api irq index.rst translation docs/zh_CN: add core-api irq irqflags-tracing.rst translation docs/zh_CN: add core-api irq irq-domain.rst translation docs/zh_CN: add core-api irq irq-affinity.rst translation docs/zh_CN: add core-api irq concepts.rst translation docs: sphinx-pre-install: don't barf on beta Sphinx releases scripts: kernel-doc: improve parsing for kernel-doc comments syntax docs/zh_CN: two minor fixes in zh_CN/doc-guide/ Documentation: dev-tools: Add Testing Overview docs/zh_CN: add translations in zh_CN/dev-tools/gcov docs: reporting-issues: make people CC the regressions list MAINTAINERS: add regressions mailing list doc:it_IT: align Italian documentation docs/zh_CN: sync reporting-issues.rst ...
| * | | docs: kvm: Fix a typo ("althought")Jonathan Neuschäfer2021-03-081-1/+1
| | |/ | |/| | | | | | | | | | | | | Signed-off-by: Jonathan Neuschäfer <j.neuschaefer@gmx.net> Link: https://lore.kernel.org/r/20210301214722.2310911-1-j.neuschaefer@gmx.net Signed-off-by: Jonathan Corbet <corbet@lwn.net>
* | | documentation/kvm: additional explanations on KVM_SET_BOOT_CPU_IDEmanuele Giuseppe Esposito2021-03-191-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | The ioctl KVM_SET_BOOT_CPU_ID fails when called after vcpu creation. Add this explanation in the documentation. Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com> Message-Id: <20210319091650.11967-1-eesposit@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* | | KVM: x86: Protect userspace MSR filter with SRCU, and set atomically-ishSean Christopherson2021-03-181-2/+4
| |/ |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Fix a plethora of issues with MSR filtering by installing the resulting filter as an atomic bundle instead of updating the live filter one range at a time. The KVM_X86_SET_MSR_FILTER ioctl() isn't truly atomic, as the hardware MSR bitmaps won't be updated until the next VM-Enter, but the relevant software struct is atomically updated, which is what KVM really needs. Similar to the approach used for modifying memslots, make arch.msr_filter a SRCU-protected pointer, do all the work configuring the new filter outside of kvm->lock, and then acquire kvm->lock only when the new filter has been vetted and created. That way vCPU readers either see the old filter or the new filter in their entirety, not some half-baked state. Yuan Yao pointed out a use-after-free in ksm_msr_allowed() due to a TOCTOU bug, but that's just the tip of the iceberg... - Nothing is __rcu annotated, making it nigh impossible to audit the code for correctness. - kvm_add_msr_filter() has an unpaired smp_wmb(). Violation of kernel coding style aside, the lack of a smb_rmb() anywhere casts all code into doubt. - kvm_clear_msr_filter() has a double free TOCTOU bug, as it grabs count before taking the lock. - kvm_clear_msr_filter() also has memory leak due to the same TOCTOU bug. The entire approach of updating the live filter is also flawed. While installing a new filter is inherently racy if vCPUs are running, fixing the above issues also makes it trivial to ensure certain behavior is deterministic, e.g. KVM can provide deterministic behavior for MSRs with identical settings in the old and new filters. An atomic update of the filter also prevents KVM from getting into a half-baked state, e.g. if installing a filter fails, the existing approach would leave the filter in a half-baked state, having already committed whatever bits of the filter were already processed. [*] https://lkml.kernel.org/r/20210312083157.25403-1-yaoyuan0329os@gmail.com Fixes: 1a155254ff93 ("KVM: x86: Introduce MSR filtering") Cc: stable@vger.kernel.org Cc: Alexander Graf <graf@amazon.com> Reported-by: Yuan Yao <yaoyuan0329os@gmail.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210316184436.2544875-2-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* | Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds2021-03-141-0/+3
|\ \ | |/ |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull KVM fixes from Paolo Bonzini: "More fixes for ARM and x86" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: KVM: LAPIC: Advancing the timer expiration on guest initiated write KVM: x86/mmu: Skip !MMU-present SPTEs when removing SP in exclusive mode KVM: kvmclock: Fix vCPUs > 64 can't be online/hotpluged kvm: x86: annotate RCU pointers KVM: arm64: Fix exclusive limit for IPA size KVM: arm64: Reject VM creation when the default IPA size is unsupported KVM: arm64: Ensure I-cache isolation between vcpus of a same VM KVM: arm64: Don't use cbz/adr with external symbols KVM: arm64: Fix range alignment when walking page tables KVM: arm64: Workaround firmware wrongly advertising GICv2-on-v3 compatibility KVM: arm64: Rename __vgic_v3_get_ich_vtr_el2() to __vgic_v3_get_gic_config() KVM: arm64: Don't access PMSELR_EL0/PMUSERENR_EL0 when no PMU is available KVM: arm64: Turn kvm_arm_support_pmu_v3() into a static key KVM: arm64: Fix nVHE hyp panic host context restore KVM: arm64: Avoid corrupting vCPU context register in guest exit KVM: arm64: nvhe: Save the SPE context early kvm: x86: use NULL instead of using plain integer as pointer KVM: SVM: Connect 'npt' module param to KVM's internal 'npt_enabled' KVM: x86: Ensure deadline timer has truly expired before posting its IRQ
| * KVM: arm64: Reject VM creation when the default IPA size is unsupportedMarc Zyngier2021-03-121-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | KVM/arm64 has forever used a 40bit default IPA space, partially due to its 32bit heritage (where the only choice is 40bit). However, there are implementations in the wild that have a *cough* much smaller *cough* IPA space, which leads to a misprogramming of VTCR_EL2, and a guest that is stuck on its first memory access if userspace dares to ask for the default IPA setting (which most VMMs do). Instead, blundly reject the creation of such VM, as we can't satisfy the requirements from userspace (with a one-off warning). Also clarify the boot warning, and document that the VM creation will fail when an unsupported IPA size is provided. Although this is an ABI change, it doesn't really change much for userspace: - the guest couldn't run before this change, but no error was returned. At least userspace knows what is happening. - a memory slot that was accepted because it did fit the default IPA space now doesn't even get a chance to be registered. The other thing that is left doing is to convince userspace to actually use the IPA space setting instead of relying on the antiquated default. Fixes: 233a7cb23531 ("kvm: arm64: Allow tuning the physical address size for VM") Signed-off-by: Marc Zyngier <maz@kernel.org> Cc: stable@vger.kernel.org Reviewed-by: Andrew Jones <drjones@redhat.com> Reviewed-by: Eric Auger <eric.auger@redhat.com> Link: https://lore.kernel.org/r/20210311100016.3830038-2-maz@kernel.org
* | Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds2021-03-041-55/+60
|\| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull KVM fixes from Paolo Bonzini: - Doc fixes - selftests fixes - Add runstate information to the new Xen support - Allow compiling out the Xen interface - 32-bit PAE without EPT bugfix - NULL pointer dereference bugfix * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: KVM: SVM: Clear the CR4 register on reset KVM: x86/xen: Add support for vCPU runstate information KVM: x86/xen: Fix return code when clearing vcpu_info and vcpu_time_info selftests: kvm: Mmap the entire vcpu mmap area KVM: Documentation: Fix index for KVM_CAP_PPC_DAWR1 KVM: x86: allow compiling out the Xen hypercall interface KVM: xen: flush deferred static key before checking it KVM: x86/mmu: Set SPTE_AD_WRPROT_ONLY_MASK if and only if PML is enabled KVM: x86: hyper-v: Fix Hyper-V context null-ptr-deref KVM: x86: remove misplaced comment on active_mmu_pages KVM: Documentation: rectify rst markup in kvm_run->flags Documentation: kvm: fix messy conversion from .txt to .rst
| * KVM: x86/xen: Add support for vCPU runstate informationDavid Woodhouse2021-03-021-0/+41
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is how Xen guests do steal time accounting. The hypervisor records the amount of time spent in each of running/runnable/blocked/offline states. In the Xen accounting, a vCPU is still in state RUNSTATE_running while in Xen for a hypercall or I/O trap, etc. Only if Xen explicitly schedules does the state become RUNSTATE_blocked. In KVM this means that even when the vCPU exits the kvm_run loop, the state remains RUNSTATE_running. The VMM can explicitly set the vCPU to RUNSTATE_blocked by using the KVM_XEN_VCPU_ATTR_TYPE_RUNSTATE_CURRENT attribute, and can also use KVM_XEN_VCPU_ATTR_TYPE_RUNSTATE_ADJUST to retrospectively add a given amount of time to the blocked state and subtract it from the running state. The state_entry_time corresponds to get_kvmclock_ns() at the time the vCPU entered the current state, and the total times of all four states should always add up to state_entry_time. Co-developed-by: Joao Martins <joao.m.martins@oracle.com> Signed-off-by: Joao Martins <joao.m.martins@oracle.com> Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> Message-Id: <20210301125309.874953-2-dwmw2@infradead.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
| * KVM: Documentation: Fix index for KVM_CAP_PPC_DAWR1Kai Huang2021-03-021-1/+1
| | | | | | | | | | | | | | | | | | It should be 7.23 instead of 7.22, which has already been taken by KVM_CAP_X86_BUS_LOCK_EXIT. Signed-off-by: Kai Huang <kai.huang@intel.com> Message-Id: <20210226094832.380394-1-kai.huang@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
| * KVM: Documentation: rectify rst markup in kvm_run->flagsChenyi Qiang2021-02-261-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit c32b1b896d2a ("KVM: X86: Add the Document for KVM_CAP_X86_BUS_LOCK_EXIT") added a new flag in kvm_run->flags documentation, and caused warning in make htmldocs: Documentation/virt/kvm/api.rst:5004: WARNING: Unexpected indentation Documentation/virt/kvm/api.rst:5004: WARNING: Inline emphasis start-string without end-string Fix this rst markup issue. Signed-off-by: Chenyi Qiang <chenyi.qiang@intel.com> Message-Id: <20210226075541.27179-1-chenyi.qiang@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
| * Documentation: kvm: fix messy conversion from .txt to .rstPaolo Bonzini2021-02-261-53/+16
| | | | | | | | | | | | | | | | | | Building the documentation gives a warning that the KVM_PPC_RESIZE_HPT_PREPARE label is defined twice. The root cause is that the KVM_PPC_RESIZE_HPT_PREPARE API is present twice, the second being a mix of the prepare and commit APIs. Fix it. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>