summaryrefslogtreecommitdiffstats
path: root/arch/arm64/kvm/vgic
Commit message (Collapse)AuthorAgeFilesLines
* KVM: arm64: Allow no running vcpu on saving vgic3 pending tableGavin Shan2023-01-291-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We don't have a running VCPU context to save vgic3 pending table due to KVM_DEV_ARM_VGIC_{GRP_CTRL, SAVE_PENDING_TABLES} command on KVM device "kvm-arm-vgic-v3". The unknown case is caught by kvm-unit-tests. # ./kvm-unit-tests/tests/its-pending-migration WARNING: CPU: 120 PID: 7973 at arch/arm64/kvm/../../../virt/kvm/kvm_main.c:3325 \ mark_page_dirty_in_slot+0x60/0xe0 : mark_page_dirty_in_slot+0x60/0xe0 __kvm_write_guest_page+0xcc/0x100 kvm_write_guest+0x7c/0xb0 vgic_v3_save_pending_tables+0x148/0x2a0 vgic_set_common_attr+0x158/0x240 vgic_v3_set_attr+0x4c/0x5c kvm_device_ioctl+0x100/0x160 __arm64_sys_ioctl+0xa8/0xf0 invoke_syscall.constprop.0+0x7c/0xd0 el0_svc_common.constprop.0+0x144/0x160 do_el0_svc+0x34/0x60 el0_svc+0x3c/0x1a0 el0t_64_sync_handler+0xb4/0x130 el0t_64_sync+0x178/0x17c Use vgic_write_guest_lock() to save vgic3 pending table. Reported-by: Zenghui Yu <yuzenghui@huawei.com> Signed-off-by: Gavin Shan <gshan@redhat.com> Reviewed-by: Oliver Upton <oliver.upton@linux.dev> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20230126235451.469087-5-gshan@redhat.com
* KVM: arm64: Allow no running vcpu on restoring vgic3 LPI pending statusGavin Shan2023-01-291-1/+1
| | | | | | | | | | | | | We don't have a running VCPU context to restore vgic3 LPI pending status due to command KVM_DEV_ARM_{VGIC_GRP_CTRL, ITS_RESTORE_TABLES} on KVM device "kvm-arm-vgic-its". Use vgic_write_guest_lock() to restore vgic3 LPI pending status. Signed-off-by: Gavin Shan <gshan@redhat.com> Reviewed-by: Oliver Upton <oliver.upton@linux.dev> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20230126235451.469087-4-gshan@redhat.com
* KVM: arm64: Add helper vgic_write_guest_lock()Gavin Shan2023-01-292-8/+19
| | | | | | | | | | | | | | | | | | | | | | | | | | Currently, the unknown no-running-vcpu sites are reported when a dirty page is tracked by mark_page_dirty_in_slot(). Until now, the only known no-running-vcpu site is saving vgic/its tables through KVM_DEV_ARM_{VGIC_GRP_CTRL, ITS_SAVE_TABLES} command on KVM device "kvm-arm-vgic-its". Unfortunately, there are more unknown sites to be handled and no-running-vcpu context will be allowed in these sites: (1) KVM_DEV_ARM_{VGIC_GRP_CTRL, ITS_RESTORE_TABLES} command on KVM device "kvm-arm-vgic-its" to restore vgic/its tables. The vgic3 LPI pending status could be restored. (2) Save vgic3 pending table through KVM_DEV_ARM_{VGIC_GRP_CTRL, VGIC_SAVE_PENDING_TABLES} command on KVM device "kvm-arm-vgic-v3". In order to handle those unknown cases, we need a unified helper vgic_write_guest_lock(). struct vgic_dist::save_its_tables_in_progress is also renamed to struct vgic_dist::save_tables_in_progress. No functional change intended. Suggested-by: Oliver Upton <oliver.upton@linux.dev> Signed-off-by: Gavin Shan <gshan@redhat.com> Reviewed-by: Oliver Upton <oliver.upton@linux.dev> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20230126235451.469087-3-gshan@redhat.com
* KVM: arm64: GICv4.1: Fix race with doorbell on VPE activation/deactivationMarc Zyngier2023-01-213-16/+18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | To save the vgic LPI pending state with GICv4.1, the VPEs must all be unmapped from the ITSs so that the sGIC caches can be flushed. The opposite is done once the state is saved. This is all done by using the activate/deactivate irqdomain callbacks directly from the vgic code. Crutially, this is done without holding the irqdesc lock for the interrupts that represent the VPE. And these callbacks are changing the state of the irqdesc. What could possibly go wrong? If a doorbell fires while we are messing with the irqdesc state, it will acquire the lock and change the interrupt state concurrently. Since we don't hole the lock, curruption occurs in on the interrupt state. Oh well. While acquiring the lock would fix this (and this was Shanker's initial approach), this is still a layering violation we could do without. A better approach is actually to free the VPE interrupt, do what we have to do, and re-request it. It is more work, but this usually happens only once in the lifetime of the VM and we don't really care about this sort of overhead. Fixes: f66b7b151e00 ("KVM: arm64: GICv4.1: Try to save VLPI state in save_pending_tables") Reported-by: Shanker Donthineni <sdonthineni@nvidia.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20230118022348.4137094-1-sdonthineni@nvidia.com
* KVM: arm64: vgic: Add Apple M2 cpus to the list of broken SEIS implementationsMarc Zyngier2023-01-051-0/+2
| | | | | | | | | | | | I really hoped that Apple had fixed their not-quite-a-vgic implementation when moving from M1 to M2. Alas, it seems they didn't, and running a buggy EFI version results in the vgic generating SErrors outside of the guest and taking the host down. Apply the same workaround as for M1. Yes, this is all a bit crap. Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20230103095022.3230946-2-maz@kernel.org
* KVM: arm64: Enable ring-based dirty memory trackingGavin Shan2022-11-101-0/+20
| | | | | | | | | | | | | | | | | | | Enable ring-based dirty memory tracking on ARM64: - Enable CONFIG_HAVE_KVM_DIRTY_RING_ACQ_REL. - Enable CONFIG_NEED_KVM_DIRTY_RING_WITH_BITMAP. - Set KVM_DIRTY_LOG_PAGE_OFFSET for the ring buffer's physical page offset. - Add ARM64 specific kvm_arch_allow_write_without_running_vcpu() to keep the site of saving vgic/its tables out of the no-running-vcpu radar. Signed-off-by: Gavin Shan <gshan@redhat.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20221110104914.31280-5-gshan@redhat.com
* KVM: arm64: vgic: Fix exit condition in scan_its_table()Eric Ren2022-10-151-1/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | With some PCIe topologies, restoring a guest fails while parsing the ITS device tables. Reproducer hints: 1. Create ARM virt VM with pxb-pcie bus which adds extra host bridges, with qemu command like: ``` -device pxb-pcie,bus_nr=8,id=pci.x,numa_node=0,bus=pcie.0 \ -device pcie-root-port,..,bus=pci.x \ ... -device pxb-pcie,bus_nr=37,id=pci.y,numa_node=1,bus=pcie.0 \ -device pcie-root-port,..,bus=pci.y \ ... ``` 2. Ensure the guest uses 2-level device table 3. Perform VM migration which calls save/restore device tables In that setup, we get a big "offset" between 2 device_ids, which makes unsigned "len" round up a big positive number, causing the scan loop to continue with a bad GPA. For example: 1. L1 table has 2 entries; 2. and we are now scanning at L2 table entry index 2075 (pointed to by L1 first entry) 3. if next device id is 9472, we will get a big offset: 7397; 4. with unsigned 'len', 'len -= offset * esz', len will underflow to a positive number, mistakenly into next iteration with a bad GPA; (It should break out of the current L2 table scanning, and jump into the next L1 table entry) 5. that bad GPA fails the guest read. Fix it by stopping the L2 table scan when the next device id is outside of the current table, allowing the scan to continue from the next L1 table entry. Thanks to Eric Auger for the fix suggestion. Fixes: 920a7a8fa92a ("KVM: arm64: vgic-its: Add infrastructure for tableookup") Suggested-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Eric Ren <renzhengeek@gmail.com> [maz: commit message tidy-up] Signed-off-by: Marc Zyngier <maz@kernel.org> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/d9c3a564af9e2c5bf63f48a7dcbf08cd593c5c0b.1665802985.git.renzhengeek@gmail.com
* KVM: arm64: vgic: Remove duplicate check in update_affinity_collection()Gavin Shan2022-09-261-1/+1
| | | | | | | | | | | | | The 'coll' parameter to update_affinity_collection() is never NULL, so comparing it with 'ite->collection' is enough to cover both the NULL case and the "another collection" case. Remove the duplicate check in update_affinity_collection(). Signed-off-by: Gavin Shan <gshan@redhat.com> [maz: repainted commit message] Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20220923065447.323445-1-gshan@redhat.com
* KVM: arm64: vgic: Tidy-up calls to vgic_{get,set}_common_attr()Marc Zyngier2022-07-171-52/+26
| | | | | | | | | | | | | | | | | The userspace accessors have an early call to vgic_{get,set}_common_attr() that makes the code hard to follow. Move it to the default: clause of the decoding switch statement, which results in a nice cleanup. This requires us to move the handling of the pending table into the common handling, even if it is strictly a GICv3 feature (it has the benefit of keeping the whole control group handling in the same function). Also cleanup vgic_v3_{get,set}_attr() while we're at it, deduplicating the calls to vgic_v3_attr_regs_access(). Suggested-by: Reiji Watanabe <reijiw@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org>
* KVM: arm64: vgic: Consolidate userspace access for base address settingMarc Zyngier2022-07-171-44/+31
| | | | | | | Align kvm_vgic_addr() with the rest of the code by moving the userspace accesses into it. kvm_vgic_addr() is also made static. Signed-off-by: Marc Zyngier <maz@kernel.org>
* KVM: arm64: vgic-v2: Add helper for legacy dist/cpuif base address settingMarc Zyngier2022-07-171-0/+32
| | | | | | | | | | | We carry a legacy interface to set the base addresses for GICv2. As this is currently plumbed into the same handling code as the modern interface, it limits the evolution we can make there. Add a helper dedicated to this handling, with a view of maybe removing this in the future. Signed-off-by: Marc Zyngier <maz@kernel.org>
* KVM: arm64: vgic: Use {get,put}_user() instead of copy_{from.to}_userMarc Zyngier2022-07-171-3/+3
| | | | | | | | | Tidy-up vgic_get_common_attr() and vgic_set_common_attr() to use {get,put}_user() instead of the more complex (and less type-safe) copy_{from,to}_user(). Reviewed-by: Reiji Watanabe <reijiw@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org>
* KVM: arm64: vgic-v2: Consolidate userspace access for MMIO registersMarc Zyngier2022-07-171-22/+17
| | | | | | | | Align the GICv2 MMIO accesses from userspace with the way the GICv3 code is now structured. Reviewed-by: Reiji Watanabe <reijiw@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org>
* KVM: arm64: vgic-v3: Consolidate userspace access for MMIO registersMarc Zyngier2022-07-171-66/+37
| | | | | | | | | | | | For userspace accesses to GICv3 MMIO registers (and related data), vgic_v3_{get,set}_attr are littered with {get,put}_user() calls, making it hard to audit and reason about. Consolidate all userspace accesses in vgic_v3_attr_regs_access(), making the code far simpler to audit. Reviewed-by: Reiji Watanabe <reijiw@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org>
* KVM: arm64: vgic-v3: Use u32 to manage the line level from userspaceMarc Zyngier2022-07-175-8/+12
| | | | | | | | | | | Despite the userspace ABI clearly defining the bits dealt with by KVM_DEV_ARM_VGIC_GRP_LEVEL_INFO as a __u32, the kernel uses a u64. Use a u32 to match the userspace ABI, which will subsequently lead to some simplifications. Reviewed-by: Reiji Watanabe <reijiw@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org>
* KVM: arm64: vgic-v3: Push user access into vgic_v3_cpu_sysregs_uaccess()Marc Zyngier2022-07-172-28/+9
| | | | | | | | | | | | In order to start making the vgic sysreg access from userspace similar to all the other sysregs, push the userspace memory access one level down into vgic_v3_cpu_sysregs_uaccess(). The next step will be to rely on the sysreg infrastructure to perform this task. Reviewed-by: Reiji Watanabe <reijiw@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org>
* KVM: arm64: vgic-v3: Simplify vgic_v3_has_cpu_sysregs_attr()Marc Zyngier2022-07-172-8/+3
| | | | | | | | | | | | | | | Finding out whether a sysreg exists has little to do with that register being accessed, so drop the is_write parameter. Also, the reg pointer is completely unused, and we're better off just passing the attr pointer to the function. This result in a small cleanup of the calling site, with a new helper converting the vGIC view of a sysreg into the canonical one (this is purely cosmetic, as the encoding is the same). Reviewed-by: Reiji Watanabe <reijiw@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org>
* KVM: arm64: Replace vgic_v3_uaccess_read_pending with vgic_uaccess_read_pendingMarc Zyngier2022-06-082-39/+22
| | | | | | | | | | | | | | | Now that GICv2 has a proper userspace accessor for the pending state, switch GICv3 over to it, dropping the local version, moving over the specific behaviours that CGIv3 requires (such as the distinction between pending latch and line level which were never enforced with GICv2). We also gain extra locking that isn't really necessary for userspace, but that's a small price to pay for getting rid of superfluous code. Signed-off-by: Marc Zyngier <maz@kernel.org> Reviewed-by: Eric Auger <eric.auger@redhat.com> Link: https://lore.kernel.org/r/20220607131427.1164881-3-maz@kernel.org
* KVM: arm64: Don't read a HW interrupt pending state in user contextMarc Zyngier2022-06-073-5/+21
| | | | | | | | | | | | | | | | | | | | | | | Since 5bfa685e62e9 ("KVM: arm64: vgic: Read HW interrupt pending state from the HW"), we're able to source the pending bit for an interrupt that is stored either on the physical distributor or on a device. However, this state is only available when the vcpu is loaded, and is not intended to be accessed from userspace. Unfortunately, the GICv2 emulation doesn't provide specific userspace accessors, and we fallback with the ones that are intended for the guest, with fatal consequences. Add a new vgic_uaccess_read_pending() accessor for userspace to use, build on top of the existing vgic_mmio_read_pending(). Reported-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Eric Auger <eric.auger@redhat.com> Tested-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Fixes: 5bfa685e62e9 ("KVM: arm64: vgic: Read HW interrupt pending state from the HW") Link: https://lore.kernel.org/r/20220607131427.1164881-2-maz@kernel.org Cc: stable@vger.kernel.org
* Merge tag 'kvmarm-5.19' of ↵Paolo Bonzini2022-05-256-55/+269
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD KVM/arm64 updates for 5.19 - Add support for the ARMv8.6 WFxT extension - Guard pages for the EL2 stacks - Trap and emulate AArch32 ID registers to hide unsupported features - Ability to select and save/restore the set of hypercalls exposed to the guest - Support for PSCI-initiated suspend in collaboration with userspace - GICv3 register-based LPI invalidation support - Move host PMU event merging into the vcpu data structure - GICv3 ITS save/restore fixes - The usual set of small-scale cleanups and fixes [Due to the conflict, KVM_SYSTEM_EVENT_SEV_TERM is relocated from 4 to 6. - Paolo]
| * Merge branch kvm-arm64/its-save-restore-fixes-5.19 into kvmarm-master/nextMarc Zyngier2022-05-161-18/+78
| |\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * kvm-arm64/its-save-restore-fixes-5.19: : . : Tighten the ITS save/restore infrastructure to fail early rather : than late. Patches courtesy of Rocardo Koller. : . KVM: arm64: vgic: Undo work in failed ITS restores KVM: arm64: vgic: Do not ignore vgic_its_restore_cte failures KVM: arm64: vgic: Add more checks when restoring ITS tables KVM: arm64: vgic: Check that new ITEs could be saved in guest memory Signed-off-by: Marc Zyngier <maz@kernel.org>
| | * KVM: arm64: vgic: Undo work in failed ITS restoresRicardo Koller2022-05-161-2/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Failed ITS restores should clean up all state restored until the failure. There is some cleanup already present when failing to restore some tables, but it's not complete. Add the missing cleanup. Note that this changes the behavior in case of a failed restore of the device tables. restore ioctl: 1. restore collection tables 2. restore device tables With this commit, failures in 2. clean up everything created so far, including state created by 1. Reviewed-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Ricardo Koller <ricarkol@google.com> Reviewed-by: Oliver Upton <oupton@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20220510001633.552496-5-ricarkol@google.com
| | * KVM: arm64: vgic: Do not ignore vgic_its_restore_cte failuresRicardo Koller2022-05-161-4/+23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Restoring a corrupted collection entry (like an out of range ID) is being ignored and treated as success. More specifically, a vgic_its_restore_cte failure is treated as success by vgic_its_restore_collection_table. vgic_its_restore_cte uses positive and negative numbers to return error, and +1 to return success. The caller then uses "ret > 0" to check for success. Fix this by having vgic_its_restore_cte only return negative numbers on error. Do this by changing alloc_collection return codes to only return negative numbers on error. Signed-off-by: Ricardo Koller <ricarkol@google.com> Reviewed-by: Oliver Upton <oupton@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20220510001633.552496-4-ricarkol@google.com
| | * KVM: arm64: vgic: Add more checks when restoring ITS tablesRicardo Koller2022-05-161-0/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Try to improve the predictability of ITS save/restores (and debuggability of failed ITS saves) by failing early on restore when trying to read corrupted tables. Restoring the ITS tables does some checks for corrupted tables, but not as many as in a save: an overflowing device ID will be detected on save but not on restore. The consequence is that restoring a corrupted table won't be detected until the next save; including the ITS not working as expected after the restore. As an example, if the guest sets tables overlapping each other, which would most likely result in some corrupted table, this is what we would see from the host point of view: guest sets base addresses that overlap each other save ioctl restore ioctl save ioctl (fails) Ideally, we would like the first save to fail, but overlapping tables could actually be intended by the guest. So, let's at least fail on the restore with some checks: like checking that device and event IDs don't overflow their tables. Signed-off-by: Ricardo Koller <ricarkol@google.com> Reviewed-by: Oliver Upton <oupton@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20220510001633.552496-3-ricarkol@google.com
| | * KVM: arm64: vgic: Check that new ITEs could be saved in guest memoryRicardo Koller2022-05-161-12/+35
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Try to improve the predictability of ITS save/restores by failing commands that would lead to failed saves. More specifically, fail any command that adds an entry into an ITS table that is not in guest memory, which would otherwise lead to a failed ITS save ioctl. There are already checks for collection and device entries, but not for ITEs. Add the corresponding check for the ITT when adding ITEs. Reviewed-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Ricardo Koller <ricarkol@google.com> Reviewed-by: Oliver Upton <oupton@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20220510001633.552496-2-ricarkol@google.com
| * | Merge branch kvm-arm64/misc-5.19 into kvmarm-master/nextMarc Zyngier2022-05-161-0/+4
| |\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * kvm-arm64/misc-5.19: : . : Misc fixes and general improvements for KVMM/arm64: : : - Better handle out of sequence sysregs in the global tables : : - Remove a couple of unnecessary loads from constant pool : : - Drop unnecessary pKVM checks : : - Add all known M1 implementations to the SEIS workaround : : - Cleanup kerneldoc warnings : . KVM: arm64: vgic-v3: List M1 Pro/Max as requiring the SEIS workaround KVM: arm64: pkvm: Don't mask already zeroed FEAT_SVE KVM: arm64: pkvm: Drop unnecessary FP/SIMD trap handler KVM: arm64: nvhe: Eliminate kernel-doc warnings KVM: arm64: Avoid unnecessary absolute addressing via literals KVM: arm64: Print emulated register table name when it is unsorted KVM: arm64: Don't BUG_ON() if emulated register table is unsorted Signed-off-by: Marc Zyngier <maz@kernel.org>
| | * | KVM: arm64: vgic-v3: List M1 Pro/Max as requiring the SEIS workaroundMarc Zyngier2022-05-151-0/+4
| | |/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Unsusprisingly, Apple M1 Pro/Max have the exact same defect as the original M1 and generate random SErrors in the host when a guest tickles the GICv3 CPU interface the wrong way. Add the part numbers for both the CPU types found in these two new implementations, and add them to the hall of shame. This also applies to the Ultra version, as it is composed of 2 Max SoCs. Signed-off-by: Marc Zyngier <maz@kernel.org> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Link: https://lore.kernel.org/r/20220514102524.3188730-1-maz@kernel.org
| * | KVM: arm64: vgic-v3: Advertise GICR_CTLR.{IR, CES} as a new GICD_IIDR revisionMarc Zyngier2022-05-044-6/+47
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Since adversising GICR_CTLR.{IC,CES} is directly observable from a guest, we need to make it selectable from userspace. For that, bump the default GICD_IIDR revision and let userspace downgrade it to the previous default. For GICv2, the two distributor revisions are strictly equivalent. Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20220405182327.205520-5-maz@kernel.org
| * | KVM: arm64: vgic-v3: Implement MMIO-based LPI invalidationMarc Zyngier2022-05-043-21/+111
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Since GICv4.1, it has become legal for an implementation to advertise GICR_{INVLPIR,INVALLR,SYNCR} while having an ITS, allowing for a more efficient invalidation scheme (no guest command queue contention when multiple CPUs are generating invalidations). Provide the invalidation registers as a primitive to their ITS counterpart. Note that we don't advertise them to the guest yet (the architecture allows an implementation to do this). Signed-off-by: Marc Zyngier <maz@kernel.org> Reviewed-by: Oliver Upton <oupton@google.com> Link: https://lore.kernel.org/r/20220405182327.205520-4-maz@kernel.org
| * | KVM: arm64: vgic-v3: Expose GICR_CTLR.RWP when disabling LPIsMarc Zyngier2022-05-043-10/+29
| |/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When disabling LPIs, a guest needs to poll GICR_CTLR.RWP in order to be sure that the write has taken effect. We so far reported it as 0, as we didn't advertise that LPIs could be turned off the first place. Start tracking this state during which LPIs are being disabled, and expose the 'in progress' state via the RWP bit. We also take this opportunity to disallow enabling LPIs and programming GICR_{PEND,PROP}BASER while LPI disabling is in progress, as allowed by the architecture (UNPRED behaviour). We don't advertise the feature to the guest yet (which is allowed by the architecture). Reviewed-by: Oliver Upton <oupton@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20220405182327.205520-3-maz@kernel.org
* / KVM: Add max_vcpus field in common 'struct kvm'Sean Christopherson2022-05-021-3/+3
|/ | | | | | | | | | | | | | | | | For TDX guests, the maximum number of vcpus needs to be specified when the TDX guest VM is initialized (creating the TDX data corresponding to TDX guest) before creating vcpu. It needs to record the maximum number of vcpus on VM creation (KVM_CREATE_VM) and return error if the number of vcpus exceeds it Because there is already max_vcpu member in arm64 struct kvm_arch, move it to common struct kvm and initialize it to KVM_MAX_VCPUS before kvm_arch_init_vm() instead of adding it to x86 struct kvm_arch. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com> Message-Id: <e53234cdee6a92357d06c80c03d77c19cdefb804.1646422845.git.isaku.yamahata@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* KVM: arm64: vgic: Remove unnecessary type castingsYu Zhe2022-04-062-6/+6
| | | | | | | | Remove unnecessary casts. Signed-off-by: Yu Zhe <yuzhe@nfschina.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20220329102059.268983-1-yuzhe@nfschina.com
* Merge tag 'kvmarm-5.18' of ↵Paolo Bonzini2022-03-181-1/+1
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD KVM/arm64 updates for 5.18 - Proper emulation of the OSLock feature of the debug architecture - Scalibility improvements for the MMU lock when dirty logging is on - New VMID allocator, which will eventually help with SVA in VMs - Better support for PMUs in heterogenous systems - PSCI 1.1 support, enabling support for SYSTEM_RESET2 - Implement CONFIG_DEBUG_LIST at EL2 - Make CONFIG_ARM64_ERRATUM_2077057 default y - Reduce the overhead of VM exit when no interrupt is pending - Remove traces of 32bit ARM host support from the documentation - Updated vgic selftests - Various cleanups, doc updates and spelling fixes
| * KVM: arm64: fix typos in commentsJulia Lawall2022-03-181-1/+1
| | | | | | | | | | | | | | | | | | Various spelling mistakes in comments. Detected with the help of Coccinelle. Signed-off-by: Julia Lawall <Julia.Lawall@inria.fr> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20220318103729.157574-24-Julia.Lawall@inria.fr
* | KVM: arm64: vgic: Read HW interrupt pending state from the HWMarc Zyngier2022-02-111-0/+2
|/ | | | | | | | | | | | | | | | | | | | | | | | It appears that a read access to GIC[DR]_I[CS]PENDRn doesn't always result in the pending interrupts being accurately reported if they are mapped to a HW interrupt. This is particularily visible when acking the timer interrupt and reading the GICR_ISPENDR1 register immediately after, for example (the interrupt appears as not-pending while it really is...). This is because a HW interrupt has its 'active and pending state' kept in the *physical* distributor, and not in the virtual one, as mandated by the spec (this is what allows the direct deactivation). The virtual distributor only caries the pending and active *states* (note the plural, as these are two independent and non-overlapping states). Fix it by reading the HW state back, either from the timer itself or from the distributor if necessary. Reported-by: Ricardo Koller <ricarkol@google.com> Tested-by: Ricardo Koller <ricarkol@google.com> Reviewed-by: Ricardo Koller <ricarkol@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20220208123726.3604198-1-maz@kernel.org
* Merge tag 'kvmarm-fixes-5.17-1' of ↵Paolo Bonzini2022-01-281-2/+15
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD KVM/arm64 fixes for 5.17, take #1 - Correctly update the shadow register on exception injection when running in nVHE mode - Correctly use the mm_ops indirection when performing cache invalidation from the page-table walker - Restrict the vgic-v3 workaround for SEIS to the two known broken implementations
| * KVM: arm64: vgic-v3: Restrict SEIS workaround to known broken systemsMarc Zyngier2022-01-221-2/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Contrary to what df652bcf1136 ("KVM: arm64: vgic-v3: Work around GICv3 locally generated SErrors") was asserting, there is at least one other system out there (Cavium ThunderX2) implementing SEIS, and not in an obviously broken way. So instead of imposing the M1 workaround on an innocent bystander, let's limit it to the two known broken Apple implementations. Fixes: df652bcf1136 ("KVM: arm64: vgic-v3: Work around GICv3 locally generated SErrors") Reported-by: Ard Biesheuvel <ardb@kernel.org> Tested-by: Ard Biesheuvel <ardb@kernel.org> Acked-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Marc Zyngier <maz@kernel.org> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20220122103912.795026-1-maz@kernel.org
* | Merge tag 'kvmarm-5.17' of ↵Paolo Bonzini2022-01-076-11/+18
|\| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD KVM/arm64 updates for Linux 5.16 - Simplification of the 'vcpu first run' by integrating it into KVM's 'pid change' flow - Refactoring of the FP and SVE state tracking, also leading to a simpler state and less shared data between EL1 and EL2 in the nVHE case - Tidy up the header file usage for the nvhe hyp object - New HYP unsharing mechanism, finally allowing pages to be unmapped from the Stage-1 EL2 page-tables - Various pKVM cleanups around refcounting and sharing - A couple of vgic fixes for bugs that would trigger once the vcpu xarray rework is merged, but not sooner - Add minimal support for ARMv8.7's PMU extension - Rework kvm_pgtable initialisation ahead of the NV work - New selftest for IRQ injection - Teach selftests about the lack of default IPA space and page sizes - Expand sysreg selftest to deal with Pointer Authentication - The usual bunch of cleanups and doc update
| * Merge branch kvm-arm64/vgic-fixes-5.17 into kvmarm-master/nextMarc Zyngier2021-12-163-7/+9
| |\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * kvm-arm64/vgic-fixes-5.17: : . : A few vgic fixes: : - Harden vgic-v3 error handling paths against signed vs unsigned : comparison that will happen once the xarray-based vcpus are in : - Demote userspace-triggered console output to kvm_debug() : . KVM: arm64: vgic: Demote userspace-triggered console prints to kvm_debug() KVM: arm64: vgic-v3: Fix vcpu index comparison Signed-off-by: Marc Zyngier <maz@kernel.org>
| | * KVM: arm64: vgic: Demote userspace-triggered console prints to kvm_debug()Marc Zyngier2021-12-162-4/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Running the KVM selftests results in these messages being dumped in the kernel console: [ 188.051073] kvm [469]: VGIC redist and dist frames overlap [ 188.056820] kvm [469]: VGIC redist and dist frames overlap [ 188.076199] kvm [469]: VGIC redist and dist frames overlap Being amle to trigger this from userspace is definitely not on, so demote these warnings to kvm_debug(). Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20211216104507.1482017-1-maz@kernel.org
| | * KVM: arm64: vgic-v3: Fix vcpu index comparisonMarc Zyngier2021-12-161-3/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When handling an error at the point where we try and register all the redistributors, we unregister all the previously registered frames by counting down from the failing index. However, the way the code is written relies on that index being a signed value. Which won't be true once we switch to an xarray-based vcpu set. Since this code is pretty awkward the first place, and that the failure mode is hard to spot, rewrite this loop to iterate over the vcpus upwards rather than downwards. Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20211216104526.1482124-1-maz@kernel.org
| * | Merge branch kvm-arm64/pkvm-cleanups-5.17 into kvmarm-master/nextMarc Zyngier2021-12-152-1/+6
| |\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * kvm-arm64/pkvm-cleanups-5.17: : . : pKVM cleanups from Quentin Perret: : : This series is a collection of various fixes and cleanups for KVM/arm64 : when running in nVHE protected mode. The first two patches are real : fixes/improvements, the following two are minor cleanups, and the last : two help satisfy my paranoia so they're certainly optional. : . KVM: arm64: pkvm: Make kvm_host_owns_hyp_mappings() robust to VHE KVM: arm64: pkvm: Stub io map functions KVM: arm64: Make __io_map_base static KVM: arm64: Make the hyp memory pool static KVM: arm64: pkvm: Disable GICv2 support KVM: arm64: pkvm: Fix hyp_pool max order Signed-off-by: Marc Zyngier <maz@kernel.org>
| | * | KVM: arm64: pkvm: Disable GICv2 supportQuentin Perret2021-12-152-1/+6
| | |/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | GICv2 requires having device mappings in guests and the hypervisor, which is incompatible with the current pKVM EL2 page ownership model which only covers memory. While it would be desirable to support pKVM with GICv2, this will require a lot more work, so let's make the current assumption clear until then. Co-developed-by: Marc Zyngier <maz@kernel.org> Signed-off-by: Marc Zyngier <maz@kernel.org> Signed-off-by: Quentin Perret <qperret@google.com> Acked-by: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20211208152300.2478542-3-qperret@google.com
| * | Merge branch kvm-arm64/misc-5.17 into kvmarm-master/nextMarc Zyngier2021-12-072-2/+2
| |\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * kvm-arm64/misc-5.17: : . : - Add minimal support for ARMv8.7's PMU extension : - Constify kvm_io_gic_ops : - Drop kvm_is_transparent_hugepage() prototype : . KVM: Drop stale kvm_is_transparent_hugepage() declaration KVM: arm64: Constify kvm_io_gic_ops KVM: arm64: Add minimal handling for the ARMv8.7 PMU Signed-off-by: Marc Zyngier <maz@kernel.org>
| | * | KVM: arm64: Constify kvm_io_gic_opsRikard Falkeborn2021-12-062-2/+2
| | |/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The only usage of kvm_io_gic_ops is to make a comparison with its address and to pass its address to kvm_iodevice_init() which takes a pointer to const kvm_io_device_ops as input. Make it const to allow the compiler to put it in read-only memory. Signed-off-by: Rikard Falkeborn <rikard.falkeborn@gmail.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20211204213518.83642-1-rikard.falkeborn@gmail.com
| * / KVM: arm64: Drop vcpu->arch.has_run_once for vcpu->pidMarc Zyngier2021-12-011-1/+1
| |/ | | | | | | | | | | | | | | | | | | | | | | | | With the transition to kvm_arch_vcpu_run_pid_change() to handle the "run once" activities, it becomes obvious that has_run_once is now an exact shadow of vcpu->pid. Replace vcpu->arch.has_run_once with a new vcpu_has_run_once() helper that directly checks for vcpu->pid, and get rid of the now unused field. Reviewed-by: Andrew Jones <drjones@redhat.com> Signed-off-by: Marc Zyngier <maz@kernel.org>
* / KVM: Use 'unsigned long' as kvm_for_each_vcpu()'s indexMarc Zyngier2021-12-087-15/+18
|/ | | | | | | | | | | | Everywhere we use kvm_for_each_vpcu(), we use an int as the vcpu index. Unfortunately, we're about to move rework the iterator, which requires this to be upgrade to an unsigned long. Let's bite the bullet and repaint all of it in one go. Signed-off-by: Marc Zyngier <maz@kernel.org> Message-Id: <20211116160403.4074052-7-maz@kernel.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* Merge branch kvm-arm64/memory-accounting into kvmarm-master/nextMarc Zyngier2021-10-175-11/+11
|\ | | | | | | | | | | | | | | | | | | | | | | * kvm-arm64/memory-accounting: : . : Sprinkle a bunch of GFP_KERNEL_ACCOUNT all over the code base : to better track memory allocation made on behalf of a VM. : . KVM: arm64: Add memcg accounting to KVM allocations KVM: arm64: vgic: Add memcg accounting to vgic allocations Signed-off-by: Marc Zyngier <maz@kernel.org>
| * KVM: arm64: vgic: Add memcg accounting to vgic allocationsJia He2021-10-175-11/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Inspired by commit 254272ce6505 ("kvm: x86: Add memcg accounting to KVM allocations"), it would be better to make arm64 vgic consistent with common kvm codes. The memory allocations of VM scope should be charged into VM process cgroup, hence change GFP_KERNEL to GFP_KERNEL_ACCOUNT. There remain a few cases since these allocations are global, not in VM scope. Signed-off-by: Jia He <justin.he@arm.com> Reviewed-by: Oliver Upton <oupton@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20210907123112.10232-2-justin.he@arm.com
* | Merge branch kvm-arm64/vgic-fixes-5.16 into kvmarm-master/nextMarc Zyngier2021-10-171-3/+18
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * kvm-arm64/vgic-fixes-5.16: : . : Multiple updates to the GICv3 emulation in order to better support : the dreadful Apple M1 that only implements half of it, and in a : broken way... : . KVM: arm64: vgic-v3: Align emulated cpuif LPI state machine with the pseudocode KVM: arm64: vgic-v3: Don't advertise ICC_CTLR_EL1.SEIS KVM: arm64: vgic-v3: Reduce common group trapping to ICV_DIR_EL1 when possible KVM: arm64: vgic-v3: Work around GICv3 locally generated SErrors KVM: arm64: Force ID_AA64PFR0_EL1.GIC=1 when exposing a virtual GICv3 Signed-off-by: Marc Zyngier <maz@kernel.org>