summaryrefslogtreecommitdiffstats
path: root/arch/arm64/kvm/mmu.c
Commit message (Collapse)AuthorAgeFilesLines
* Merge tag 'kvmarm-fixes-5.10-2' of ↵Paolo Bonzini2020-11-081-0/+2
|\ | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD KVM/arm64 fixes for v5.10, take #2 - Fix compilation error when PMD and PUD are folded - Fix regresssion of the RAZ behaviour of ID_AA64ZFR0_EL1
| * KVM: arm64: Fix build error in user_mem_abort()Gavin Shan2020-11-061-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The PUD and PMD are folded into PGD when the following options are enabled. In that case, PUD_SHIFT is equal to PMD_SHIFT and we fail to build with the indicated errors: CONFIG_ARM64_VA_BITS_42=y CONFIG_ARM64_PAGE_SHIFT=16 CONFIG_PGTABLE_LEVELS=3 arch/arm64/kvm/mmu.c: In function ‘user_mem_abort’: arch/arm64/kvm/mmu.c:798:2: error: duplicate case value case PMD_SHIFT: ^~~~ arch/arm64/kvm/mmu.c:791:2: note: previously used here case PUD_SHIFT: ^~~~ This fixes the issue by skipping the check on PUD huge page when PUD and PMD are folded into PGD. Fixes: 2f40c46021bbb ("KVM: arm64: Use fallback mapping sizes for contiguous huge page sizes") Reported-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Gavin Shan <gshan@redhat.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20201103003009.32955-1-gshan@redhat.com
* | Merge tag 'kvmarm-fixes-5.10-1' of ↵Paolo Bonzini2020-10-301-7/+20
|\| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD KVM/arm64 fixes for 5.10, take #1 - Force PTE mapping on device pages provided via VFIO - Fix detection of cacheable mapping at S2 - Fallback to PMD/PTE mappings for composite huge pages - Fix accounting of Stage-2 PGD allocation - Fix AArch32 handling of some of the debug registers - Simplify host HYP entry - Fix stray pointer conversion on nVHE TLB invalidation - Fix initialization of the nVHE code - Simplify handling of capabilities exposed to HYP - Nuke VCPUs caught using a forbidden AArch32 EL0
| * KVM: arm64: Force PTE mapping on fault resulting in a device mappingSantosh Shukla2020-10-291-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | VFIO allows a device driver to resolve a fault by mapping a MMIO range. This can be subsequently result in user_mem_abort() to try and compute a huge mapping based on the MMIO pfn, which is a sure recipe for things to go wrong. Instead, force a PTE mapping when the pfn faulted in has a device mapping. Fixes: 6d674e28f642 ("KVM: arm/arm64: Properly handle faulting of device mappings") Suggested-by: Marc Zyngier <maz@kernel.org> Signed-off-by: Santosh Shukla <sashukla@nvidia.com> [maz: rewritten commit message] Signed-off-by: Marc Zyngier <maz@kernel.org> Reviewed-by: Gavin Shan <gshan@redhat.com> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/1603711447-11998-2-git-send-email-sashukla@nvidia.com
| * KVM: arm64: Use fallback mapping sizes for contiguous huge page sizesGavin Shan2020-10-291-7/+19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Although huge pages can be created out of multiple contiguous PMDs or PTEs, the corresponding sizes are not supported at Stage-2 yet. Instead of failing the mapping, fall back to the nearer supported mapping size (CONT_PMD to PMD and CONT_PTE to PTE respectively). Suggested-by: Marc Zyngier <maz@kernel.org> Signed-off-by: Gavin Shan <gshan@redhat.com> [maz: rewritten commit message] Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20201025230626.18501-1-gshan@redhat.com
* | Merge tag 'kvmarm-5.10' of ↵Paolo Bonzini2020-10-201-1401/+210
|\| | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD KVM/arm64 updates for Linux 5.10 - New page table code for both hypervisor and guest stage-2 - Introduction of a new EL2-private host context - Allow EL2 to have its own private per-CPU variables - Support of PMU event filtering - Complete rework of the Spectre mitigation
| * KVM: arm64: Ensure user_mem_abort() return value is initialisedWill Deacon2020-10-021-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If a change in the MMU notifier sequence number forces user_mem_abort() to return early when attempting to handle a stage-2 fault, we return uninitialised stack to kvm_handle_guest_abort(), which could potentially result in the injection of an external abort into the guest or a spurious return to userspace. Neither or these are what we want to do. Initialise 'ret' to 0 in user_mem_abort() so that bailing due to a change in the MMU notrifier sequence number is treated as though the fault was handled. Reported-by: kernel test robot <lkp@intel.com> Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Will Deacon <will@kernel.org> Signed-off-by: Marc Zyngier <maz@kernel.org> Reviewed-by: Alexandru Elisei <alexandru.elisei@arm.com> Reviewed-by: Gavin Shan <gshan@redhat.com> Cc: Gavin Shan <gshan@redhat.com> Cc: Alexandru Elisei <alexandru.elisei@arm.com> Link: https://lore.kernel.org/r/20200930102442.16142-1-will@kernel.org
| * KVM: arm64: Fix doc warnings in mmu codeXiaofei Tan2020-09-181-2/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Fix following warnings caused by mismatch bewteen function parameters and comments. arch/arm64/kvm/mmu.c:128: warning: Function parameter or member 'mmu' not described in '__unmap_stage2_range' arch/arm64/kvm/mmu.c:128: warning: Function parameter or member 'may_block' not described in '__unmap_stage2_range' arch/arm64/kvm/mmu.c:128: warning: Excess function parameter 'kvm' description in '__unmap_stage2_range' arch/arm64/kvm/mmu.c:499: warning: Function parameter or member 'writable' not described in 'kvm_phys_addr_ioremap' arch/arm64/kvm/mmu.c:538: warning: Function parameter or member 'mmu' not described in 'stage2_wp_range' arch/arm64/kvm/mmu.c:538: warning: Excess function parameter 'kvm' description in 'stage2_wp_range' Signed-off-by: Xiaofei Tan <tanxiaofei@huawei.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Acked-by: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/1600307269-50957-1-git-send-email-tanxiaofei@huawei.com
| * KVM: arm64: Do not flush memslot if FWB is supportedAlexandru Elisei2020-09-181-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | As a result of a KVM_SET_USER_MEMORY_REGION ioctl, KVM flushes the dcache for the memslot being changed to ensure a consistent view of memory between the host and the guest: the host runs with caches enabled, and it is possible for the data written by the hypervisor to still be in the caches, but the guest is running with stage 1 disabled, meaning data accesses are to Device-nGnRnE memory, bypassing the caches entirely. Flushing the dcache is not necessary when KVM enables FWB, because it forces the guest to uses cacheable memory accesses. The current behaviour does not change, as the dcache flush helpers execute the cache operation only if FWB is not enabled, but walking the stage 2 table is avoided. Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20200915170442.131635-1-alexandru.elisei@arm.com
| * KVM: arm64: Try PMD block mappings if PUD mappings are not supportedAlexandru Elisei2020-09-181-5/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When userspace uses hugetlbfs for the VM memory, user_mem_abort() tries to use the same block size to map the faulting IPA in stage 2. If stage 2 cannot the same block mapping because the block size doesn't fit in the memslot or the memslot is not properly aligned, user_mem_abort() will fall back to a page mapping, regardless of the block size. We can do better for PUD backed hugetlbfs by checking if a PMD block mapping is supported before deciding to use a page. vma_pagesize is an unsigned long, use 1UL instead of 1ULL when assigning its value. Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20200910133351.118191-1-alexandru.elisei@arm.com
| * KVM: arm64: Remove unused 'pgd' field from 'struct kvm_s2_mmu'Will Deacon2020-09-111-2/+0
| | | | | | | | | | | | | | | | | | | | | | | | The stage-2 page-tables are entirely encapsulated by the 'pgt' field of 'struct kvm_s2_mmu', so remove the unused 'pgd' field. Signed-off-by: Will Deacon <will@kernel.org> Signed-off-by: Marc Zyngier <maz@kernel.org> Reviewed-by: Gavin Shan <gshan@redhat.com> Cc: Marc Zyngier <maz@kernel.org> Cc: Quentin Perret <qperret@google.com> Link: https://lore.kernel.org/r/20200911132529.19844-21-will@kernel.org
| * KVM: arm64: Remove unused page-table codeWill Deacon2020-09-111-755/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | Now that KVM is using the generic page-table code to manage the guest stage-2 page-tables, we can remove a bunch of unused macros, #defines and static inline functions from the old implementation. Signed-off-by: Will Deacon <will@kernel.org> Signed-off-by: Marc Zyngier <maz@kernel.org> Reviewed-by: Gavin Shan <gshan@redhat.com> Cc: Marc Zyngier <maz@kernel.org> Cc: Quentin Perret <qperret@google.com> Link: https://lore.kernel.org/r/20200911132529.19844-20-will@kernel.org
| * KVM: arm64: Check the pgt instead of the pgd when modifying page-tableWill Deacon2020-09-111-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | In preparation for removing the 'pgd' field of 'struct kvm_s2_mmu', update the few remaining users to check the 'pgt' field instead. Signed-off-by: Will Deacon <will@kernel.org> Signed-off-by: Marc Zyngier <maz@kernel.org> Reviewed-by: Gavin Shan <gshan@redhat.com> Cc: Marc Zyngier <maz@kernel.org> Cc: Quentin Perret <qperret@google.com> Link: https://lore.kernel.org/r/20200911132529.19844-19-will@kernel.org
| * KVM: arm64: Convert user_mem_abort() to generic page-table APIWill Deacon2020-09-111-80/+44
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Convert user_mem_abort() to call kvm_pgtable_stage2_relax_perms() when handling a stage-2 permission fault and kvm_pgtable_stage2_map() when handling a stage-2 translation fault, rather than walking the page-table manually. Signed-off-by: Will Deacon <will@kernel.org> Signed-off-by: Marc Zyngier <maz@kernel.org> Reviewed-by: Gavin Shan <gshan@redhat.com> Reviewed-by: Alexandru Elisei <alexandru.elisei@arm.com> Cc: Marc Zyngier <maz@kernel.org> Cc: Quentin Perret <qperret@google.com> Link: https://lore.kernel.org/r/20200911132529.19844-18-will@kernel.org
| * KVM: arm64: Convert memslot cache-flushing code to generic page-table APIQuentin Perret2020-09-111-12/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | Convert stage2_flush_memslot() to call the kvm_pgtable_stage2_flush() function of the generic page-table code instead of walking the page-table directly. Signed-off-by: Quentin Perret <qperret@google.com> Signed-off-by: Will Deacon <will@kernel.org> Signed-off-by: Marc Zyngier <maz@kernel.org> Reviewed-by: Gavin Shan <gshan@redhat.com> Cc: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20200911132529.19844-16-will@kernel.org
| * KVM: arm64: Convert write-protect operation to generic page-table APIQuentin Perret2020-09-111-21/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | Convert stage2_wp_range() to call the kvm_pgtable_stage2_wrprotect() function of the generic page-table code instead of walking the page-table directly. Signed-off-by: Quentin Perret <qperret@google.com> Signed-off-by: Will Deacon <will@kernel.org> Signed-off-by: Marc Zyngier <maz@kernel.org> Reviewed-by: Gavin Shan <gshan@redhat.com> Cc: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20200911132529.19844-14-will@kernel.org
| * KVM: arm64: Convert page-aging and access faults to generic page-table APIWill Deacon2020-09-111-58/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | Convert the page-aging functions and access fault handler to use the generic page-table code instead of walking the page-table directly. Signed-off-by: Will Deacon <will@kernel.org> Signed-off-by: Marc Zyngier <maz@kernel.org> Reviewed-by: Gavin Shan <gshan@redhat.com> Reviewed-by: Alexandru Elisei <alexandru.elisei@arm.com> Cc: Marc Zyngier <maz@kernel.org> Cc: Quentin Perret <qperret@google.com> Link: https://lore.kernel.org/r/20200911132529.19844-12-will@kernel.org
| * KVM: arm64: Convert unmap_stage2_range() to generic page-table APIWill Deacon2020-09-111-25/+37
| | | | | | | | | | | | | | | | | | | | | | | | Convert unmap_stage2_range() to use kvm_pgtable_stage2_unmap() instead of walking the page-table directly. Signed-off-by: Will Deacon <will@kernel.org> Signed-off-by: Marc Zyngier <maz@kernel.org> Reviewed-by: Gavin Shan <gshan@redhat.com> Cc: Marc Zyngier <maz@kernel.org> Cc: Quentin Perret <qperret@google.com> Link: https://lore.kernel.org/r/20200911132529.19844-10-will@kernel.org
| * KVM: arm64: Convert kvm_set_spte_hva() to generic page-table APIWill Deacon2020-09-111-13/+10
| | | | | | | | | | | | | | | | | | | | | | | | Convert kvm_set_spte_hva() to use kvm_pgtable_stage2_map() instead of stage2_set_pte(). Signed-off-by: Will Deacon <will@kernel.org> Signed-off-by: Marc Zyngier <maz@kernel.org> Reviewed-by: Gavin Shan <gshan@redhat.com> Cc: Marc Zyngier <maz@kernel.org> Cc: Quentin Perret <qperret@google.com> Link: https://lore.kernel.org/r/20200911132529.19844-9-will@kernel.org
| * KVM: arm64: Convert kvm_phys_addr_ioremap() to generic page-table APIWill Deacon2020-09-111-16/+14
| | | | | | | | | | | | | | | | | | | | | | | | Convert kvm_phys_addr_ioremap() to use kvm_pgtable_stage2_map() instead of stage2_set_pte(). Signed-off-by: Will Deacon <will@kernel.org> Signed-off-by: Marc Zyngier <maz@kernel.org> Reviewed-by: Gavin Shan <gshan@redhat.com> Cc: Marc Zyngier <maz@kernel.org> Cc: Quentin Perret <qperret@google.com> Link: https://lore.kernel.org/r/20200911132529.19844-8-will@kernel.org
| * KVM: arm64: Add support for creating kernel-agnostic stage-2 page tablesWill Deacon2020-09-111-26/+29
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Introduce alloc() and free() functions to the generic page-table code for guest stage-2 page-tables and plumb these into the existing KVM page-table allocator. Subsequent patches will convert other operations within the KVM allocator over to the generic code. Signed-off-by: Will Deacon <will@kernel.org> Signed-off-by: Marc Zyngier <maz@kernel.org> Reviewed-by: Gavin Shan <gshan@redhat.com> Cc: Marc Zyngier <maz@kernel.org> Cc: Quentin Perret <qperret@google.com> Link: https://lore.kernel.org/r/20200911132529.19844-6-will@kernel.org
| * KVM: arm64: Use generic allocator for hyp stage-1 page-tablesWill Deacon2020-09-111-376/+38
| | | | | | | | | | | | | | | | | | | | | | | | | | Now that we have a shiny new page-table allocator, replace the hyp page-table code with calls into the new API. This also allows us to remove the extended idmap code, as we can now simply ensure that the VA size is large enough to map everything we need. Signed-off-by: Will Deacon <will@kernel.org> Signed-off-by: Marc Zyngier <maz@kernel.org> Cc: Marc Zyngier <maz@kernel.org> Cc: Quentin Perret <qperret@google.com> Link: https://lore.kernel.org/r/20200911132529.19844-5-will@kernel.org
| * KVM: arm64: Remove kvm_mmu_free_memory_caches()Will Deacon2020-09-111-5/+0
| | | | | | | | | | | | | | | | | | | | | | | | kvm_mmu_free_memory_caches() is only called by kvm_arch_vcpu_destroy(), so inline the implementation and get rid of the extra function. Signed-off-by: Will Deacon <will@kernel.org> Signed-off-by: Marc Zyngier <maz@kernel.org> Reviewed-by: Gavin Shan <gshan@redhat.com> Cc: Marc Zyngier <maz@kernel.org> Cc: Quentin Perret <qperret@google.com> Link: https://lore.kernel.org/r/20200911132529.19844-2-will@kernel.org
* | Merge tag 'kvmarm-fixes-5.9-2' of ↵Paolo Bonzini2020-09-201-2/+2
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into kvm-master KVM/arm64 fixes for 5.9, take #2 - Fix handling of S1 Page Table Walk permission fault at S2 on instruction fetch - Cleanup kvm_vcpu_dabt_iswrite()
| * | KVM: arm64: Assume write fault on S1PTW permission fault on instruction fetchMarc Zyngier2020-09-181-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | KVM currently assumes that an instruction abort can never be a write. This is in general true, except when the abort is triggered by a S1PTW on instruction fetch that tries to update the S1 page tables (to set AF, for example). This can happen if the page tables have been paged out and brought back in without seeing a direct write to them (they are thus marked read only), and the fault handling code will make the PT executable(!) instead of writable. The guest gets stuck forever. In these conditions, the permission fault must be considered as a write so that the Stage-1 update can take place. This is essentially the I-side equivalent of the problem fixed by 60e21a0ef54c ("arm64: KVM: Take S1 walks into account when determining S2 write faults"). Update kvm_is_write_fault() to return true on IABT+S1PTW, and introduce kvm_vcpu_trap_is_exec_fault() that only return true when no faulting on a S1 fault. Additionally, kvm_vcpu_dabt_iss1tw() is renamed to kvm_vcpu_abt_iss1tw(), as the above makes it plain that it isn't specific to data abort. Signed-off-by: Marc Zyngier <maz@kernel.org> Reviewed-by: Will Deacon <will@kernel.org> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20200915104218.1284701-2-maz@kernel.org
* | | Merge tag 'kvmarm-fixes-5.9-1' of ↵Paolo Bonzini2020-09-111-1/+7
|\| | | |/ |/| | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD KVM/arm64 fixes for Linux 5.9, take #1 - Multiple stolen time fixes, with a new capability to match x86 - Fix for hugetlbfs mappings when PUD and PMD are the same level - Fix for hugetlbfs mappings when PTE mappings are enforced (dirty logging, for example) - Fix tracing output of 64bit values
| * KVM: arm64: Update page shift if stage 2 block mapping not supportedAlexandru Elisei2020-09-041-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit 196f878a7ac2e (" KVM: arm/arm64: Signal SIGBUS when stage2 discovers hwpoison memory") modifies user_mem_abort() to send a SIGBUS signal when the fault IPA maps to a hwpoisoned page. Commit 1559b7583ff6 ("KVM: arm/arm64: Re-check VMA on detecting a poisoned page") changed kvm_send_hwpoison_signal() to use the page shift instead of the VMA because at that point the code had already released the mmap lock, which means userspace could have modified the VMA. If userspace uses hugetlbfs for the VM memory, user_mem_abort() tries to map the guest fault IPA using block mappings in stage 2. That is not always possible, if, for example, userspace uses dirty page logging for the VM. Update the page shift appropriately in those cases when we downgrade the stage 2 entry from a block mapping to a page. Fixes: 1559b7583ff6 ("KVM: arm/arm64: Re-check VMA on detecting a poisoned page") Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Reviewed-by: Gavin Shan <gshan@redhat.com> Link: https://lore.kernel.org/r/20200901133357.52640-2-alexandru.elisei@arm.com
| * KVM: arm64: Do not try to map PUDs when they are folded into PMDMarc Zyngier2020-09-041-1/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | For the obscure cases where PMD and PUD are the same size (64kB pages with 42bit VA, for example, which results in only two levels of page tables), we can't map anything as a PUD, because there is... erm... no PUD to speak of. Everything is either a PMD or a PTE. So let's only try and map a PUD when its size is different from that of a PMD. Cc: stable@vger.kernel.org Fixes: b8e0ba7c8bea ("KVM: arm64: Add support for creating PUD hugepages at stage 2") Reported-by: Gavin Shan <gshan@redhat.com> Reported-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Alexandru Elisei <alexandru.elisei@arm.com> Reviewed-by: Gavin Shan <gshan@redhat.com> Tested-by: Gavin Shan <gshan@redhat.com> Tested-by: Eric Auger <eric.auger@redhat.com> Tested-by: Alexandru Elisei <alexandru.elisei@arm.com> Signed-off-by: Marc Zyngier <maz@kernel.org>
* | KVM: arm64: Only reschedule if MMU_NOTIFIER_RANGE_BLOCKABLE is not setWill Deacon2020-08-211-4/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When an MMU notifier call results in unmapping a range that spans multiple PGDs, we end up calling into cond_resched_lock() when crossing a PGD boundary, since this avoids running into RCU stalls during VM teardown. Unfortunately, if the VM is destroyed as a result of OOM, then blocking is not permitted and the call to the scheduler triggers the following BUG(): | BUG: sleeping function called from invalid context at arch/arm64/kvm/mmu.c:394 | in_atomic(): 1, irqs_disabled(): 0, non_block: 1, pid: 36, name: oom_reaper | INFO: lockdep is turned off. | CPU: 3 PID: 36 Comm: oom_reaper Not tainted 5.8.0 #1 | Hardware name: QEMU QEMU Virtual Machine, BIOS 0.0.0 02/06/2015 | Call trace: | dump_backtrace+0x0/0x284 | show_stack+0x1c/0x28 | dump_stack+0xf0/0x1a4 | ___might_sleep+0x2bc/0x2cc | unmap_stage2_range+0x160/0x1ac | kvm_unmap_hva_range+0x1a0/0x1c8 | kvm_mmu_notifier_invalidate_range_start+0x8c/0xf8 | __mmu_notifier_invalidate_range_start+0x218/0x31c | mmu_notifier_invalidate_range_start_nonblock+0x78/0xb0 | __oom_reap_task_mm+0x128/0x268 | oom_reap_task+0xac/0x298 | oom_reaper+0x178/0x17c | kthread+0x1e4/0x1fc | ret_from_fork+0x10/0x30 Use the new 'flags' argument to kvm_unmap_hva_range() to ensure that we only reschedule if MMU_NOTIFIER_RANGE_BLOCKABLE is set in the notifier flags. Cc: <stable@vger.kernel.org> Fixes: 8b3405e345b5 ("kvm: arm/arm64: Fix locking for kvm_free_stage2_pgd") Cc: Marc Zyngier <maz@kernel.org> Cc: Suzuki K Poulose <suzuki.poulose@arm.com> Cc: James Morse <james.morse@arm.com> Signed-off-by: Will Deacon <will@kernel.org> Message-Id: <20200811102725.7121-3-will@kernel.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* | KVM: Pass MMU notifier range flags to kvm_unmap_hva_range()Will Deacon2020-08-211-1/+1
|/ | | | | | | | | | | | | | | | | | | | The 'flags' field of 'struct mmu_notifier_range' is used to indicate whether invalidate_range_{start,end}() are permitted to block. In the case of kvm_mmu_notifier_invalidate_range_start(), this field is not forwarded on to the architecture-specific implementation of kvm_unmap_hva_range() and therefore the backend cannot sensibly decide whether or not to block. Add an extra 'flags' parameter to kvm_unmap_hva_range() so that architectures are aware as to whether or not they are permitted to block. Cc: <stable@vger.kernel.org> Cc: Marc Zyngier <maz@kernel.org> Cc: Suzuki K Poulose <suzuki.poulose@arm.com> Cc: James Morse <james.morse@arm.com> Signed-off-by: Will Deacon <will@kernel.org> Message-Id: <20200811102725.7121-2-will@kernel.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* Merge tag 'kvmarm-5.9' of ↵Paolo Bonzini2020-08-091-133/+178
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into kvm-next-5.6 KVM/arm64 updates for Linux 5.9: - Split the VHE and nVHE hypervisor code bases, build the EL2 code separately, allowing for the VHE code to now be built with instrumentation - Level-based TLB invalidation support - Restructure of the vcpu register storage to accomodate the NV code - Pointer Authentication available for guests on nVHE hosts - Simplification of the system register table parsing - MMU cleanups and fixes - A number of post-32bit cleanups and other fixes
| * Merge branch 'kvm-arm64/misc-5.9' into kvmarm-master/nextMarc Zyngier2020-07-301-9/+17
| |\ | | | | | | | | | Signed-off-by: Marc Zyngier <maz@kernel.org>
| | * KVM: arm64: Move S1PTW S2 fault logic out of io_mem_abort()Will Deacon2020-07-301-1/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | To allow for re-injection of stage-2 faults on stage-1 page-table walks due to either a missing or read-only memslot, move the triage logic out of io_mem_abort() and into kvm_handle_guest_abort(), where these aborts can be handled before anything else. Signed-off-by: Will Deacon <will@kernel.org> Signed-off-by: Marc Zyngier <maz@kernel.org> Cc: Marc Zyngier <maz@kernel.org> Cc: Quentin Perret <qperret@google.com> Link: https://lore.kernel.org/r/20200729102821.23392-5-will@kernel.org
| | * KVM: arm64: Don't skip cache maintenance for read-only memslotsWill Deacon2020-07-301-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If a guest performs cache maintenance on a read-only memslot, we should inform userspace rather than skip the instruction altogether. Signed-off-by: Will Deacon <will@kernel.org> Signed-off-by: Marc Zyngier <maz@kernel.org> Cc: Marc Zyngier <maz@kernel.org> Cc: Quentin Perret <qperret@google.com> Link: https://lore.kernel.org/r/20200729102821.23392-4-will@kernel.org
| | * KVM: arm64: Handle data and instruction external aborts the same wayWill Deacon2020-07-301-6/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If the guest generates a synchronous external abort which is not handled by the host, we inject it back into the guest as a virtual SError, but only if the original fault was reported on the data side. Instruction faults are reported as "Unsupported FSC", causing the vCPU run loop to bail with -EFAULT. Although synchronous external aborts from a guest are pretty unusual, treat them the same regardless of whether they are taken as data or instruction aborts by EL2. Signed-off-by: Will Deacon <will@kernel.org> Signed-off-by: Marc Zyngier <maz@kernel.org> Cc: Marc Zyngier <maz@kernel.org> Cc: Quentin Perret <qperret@google.com> Link: https://lore.kernel.org/r/20200729102821.23392-3-will@kernel.org
| | * KVM: arm64: Rename kvm_vcpu_dabt_isextabt()Will Deacon2020-07-301-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | kvm_vcpu_dabt_isextabt() is not specific to data aborts and, unlike kvm_vcpu_dabt_issext(), has nothing to do with sign extension. Rename it to 'kvm_vcpu_abt_issea()'. Signed-off-by: Will Deacon <will@kernel.org> Signed-off-by: Marc Zyngier <maz@kernel.org> Cc: Marc Zyngier <maz@kernel.org> Cc: Quentin Perret <qperret@google.com> Link: https://lore.kernel.org/r/20200729102821.23392-2-will@kernel.org
| * | Merge branch 'kvm-arm64/misc-5.9' into kvmarm-master/next-WIPMarc Zyngier2020-07-281-3/+3
| |\| | | | | | | | | | Signed-off-by: Marc Zyngier <maz@kernel.org>
| | * KVM: arm64: Rename HSR to ESRGavin Shan2020-07-051-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | kvm/arm32 isn't supported since commit 541ad0150ca4 ("arm: Remove 32bit KVM host support"). So HSR isn't meaningful since then. This renames HSR to ESR accordingly. This shouldn't cause any functional changes: * Rename kvm_vcpu_get_hsr() to kvm_vcpu_get_esr() to make the function names self-explanatory. * Rename variables from @hsr to @esr to make them self-explanatory. Note that the renaming on uapi and tracepoint will cause ABI changes, which we should avoid. Specificly, there are 4 related source files in this regard: * arch/arm64/include/uapi/asm/kvm.h (struct kvm_debug_exit_arch::hsr) * arch/arm64/kvm/handle_exit.c (struct kvm_debug_exit_arch::hsr) * arch/arm64/kvm/trace_arm.h (tracepoints) * arch/arm64/kvm/trace_handle_exit.h (tracepoints) Signed-off-by: Gavin Shan <gshan@redhat.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Acked-by: Andrew Scull <ascull@google.com> Link: https://lore.kernel.org/r/20200630015705.103366-1-gshan@redhat.com
| * | KVM: arm64: Use TTL hint in when invalidating stage-2 translationsMarc Zyngier2020-07-071-14/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Since we often have a precise idea of the level we're dealing with when invalidating TLBs, we can provide it to as a hint to our invalidation helper. Reviewed-by: James Morse <james.morse@arm.com> Reviewed-by: Alexandru Elisei <alexandru.elisei@arm.com> Signed-off-by: Marc Zyngier <maz@kernel.org>
| * | KVM: arm64: Factor out stage 2 page table data from struct kvmChristoffer Dall2020-07-071-122/+158
| |/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | As we are about to reuse our stage 2 page table manipulation code for shadow stage 2 page tables in the context of nested virtualization, we are going to manage multiple stage 2 page tables for a single VM. This requires some pretty invasive changes to our data structures, which moves the vmid and pgd pointers into a separate structure and change pretty much all of our mmu code to operate on this structure instead. The new structure is called struct kvm_s2_mmu. There is no intended functional change by this patch alone. Reviewed-by: James Morse <james.morse@arm.com> Reviewed-by: Alexandru Elisei <alexandru.elisei@arm.com> [Designed data structure layout in collaboration] Signed-off-by: Christoffer Dall <christoffer.dall@arm.com> Co-developed-by: Marc Zyngier <maz@kernel.org> [maz: Moved the last_vcpu_ran down to the S2 MMU structure as well] Signed-off-by: Marc Zyngier <maz@kernel.org>
* | Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds2020-08-061-48/+13
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull KVM updates from Paolo Bonzini: "s390: - implement diag318 x86: - Report last CPU for debugging - Emulate smaller MAXPHYADDR in the guest than in the host - .noinstr and tracing fixes from Thomas - nested SVM page table switching optimization and fixes Generic: - Unify shadow MMU cache data structures across architectures" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (127 commits) KVM: SVM: Fix sev_pin_memory() error handling KVM: LAPIC: Set the TDCR settable bits KVM: x86: Specify max TDP level via kvm_configure_mmu() KVM: x86/mmu: Rename max_page_level to max_huge_page_level KVM: x86: Dynamically calculate TDP level from max level and MAXPHYADDR KVM: VXM: Remove temporary WARN on expected vs. actual EPTP level mismatch KVM: x86: Pull the PGD's level from the MMU instead of recalculating it KVM: VMX: Make vmx_load_mmu_pgd() static KVM: x86/mmu: Add separate helper for shadow NPT root page role calc KVM: VMX: Drop a duplicate declaration of construct_eptp() KVM: nSVM: Correctly set the shadow NPT root level in its MMU role KVM: Using macros instead of magic values MIPS: KVM: Fix build error caused by 'kvm_run' cleanup KVM: nSVM: remove nonsensical EXITINFO1 adjustment on nested NPF KVM: x86: Add a capability for GUEST_MAXPHYADDR < HOST_MAXPHYADDR support KVM: VMX: optimize #PF injection when MAXPHYADDR does not match KVM: VMX: Add guest physical address check in EPT violation and misconfig KVM: VMX: introduce vmx_need_pf_intercept KVM: x86: update exception bitmap on CPUID changes KVM: x86: rename update_bp_intercept to update_exception_bitmap ...
| * | KVM: arm64: clean up redundant 'kvm_run' parametersTianjia Zhang2020-07-101-3/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In the current kvm version, 'kvm_run' has been included in the 'kvm_vcpu' structure. For historical reasons, many kvm-related function parameters retain the 'kvm_run' and 'kvm_vcpu' parameters at the same time. This patch does a unified cleanup of these remaining redundant parameters. Signed-off-by: Tianjia Zhang <tianjia.zhang@linux.alibaba.com> Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Message-Id: <20200623131418.31473-3-tianjia.zhang@linux.alibaba.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
| * | KVM: arm64: Use common KVM implementation of MMU memory cachesSean Christopherson2020-07-091-42/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Move to the common MMU memory cache implementation now that the common code and arm64's existing code are semantically compatible. No functional change intended. Cc: Marc Zyngier <maz@kernel.org> Suggested-by: Christoffer Dall <christoffer.dall@arm.com> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Message-Id: <20200703023545.8771-19-sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
| * | KVM: arm64: Use common code's approach for __GFP_ZERO with memory cachesSean Christopherson2020-07-091-2/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add a "gfp_zero" member to arm64's 'struct kvm_mmu_memory_cache' to make the struct and its usage compatible with the common 'struct kvm_mmu_memory_cache' in linux/kvm_host.h. This will minimize code churn when arm64 moves to the common implementation in a future patch, at the cost of temporarily having somewhat silly code. Note, GFP_PGTABLE_USER is equivalent to GFP_KERNEL_ACCOUNT | GFP_ZERO: #define GFP_PGTABLE_USER (GFP_PGTABLE_KERNEL | __GFP_ACCOUNT) | -> #define GFP_PGTABLE_KERNEL (GFP_KERNEL | __GFP_ZERO) == GFP_KERNEL | __GFP_ACCOUNT | __GFP_ZERO versus #define GFP_KERNEL_ACCOUNT (GFP_KERNEL | __GFP_ACCOUNT) with __GFP_ZERO explicitly OR'd in == GFP_KERNEL | __GFP_ACCOUNT | __GFP_ZERO No functional change intended. Tested-by: Marc Zyngier <maz@kernel.org> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Message-Id: <20200703023545.8771-18-sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
| * | KVM: arm64: Drop @max param from mmu_topup_memory_cache()Sean Christopherson2020-07-091-8/+4
| |/ | | | | | | | | | | | | | | | | | | | | | | | | | | Replace the @max param in mmu_topup_memory_cache() and instead use ARRAY_SIZE() to terminate the loop to fill the cache. This removes a BUG_ON() and sets the stage for moving arm64 to the common memory cache implementation. No functional change intended. Tested-by: Marc Zyngier <maz@kernel.org> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Message-Id: <20200703023545.8771-17-sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* / KVM: arm64: Don't inherit exec permission across page-table levelsWill Deacon2020-07-281-5/+6
|/ | | | | | | | | | | | | | | | | | | | | | If a stage-2 page-table contains an executable, read-only mapping at the pte level (e.g. due to dirty logging being enabled), a subsequent write fault to the same page which tries to install a larger block mapping (e.g. due to dirty logging having been disabled) will erroneously inherit the exec permission and consequently skip I-cache invalidation for the rest of the block. Ensure that exec permission is only inherited by write faults when the new mapping is of the same size as the existing one. A subsequent instruction abort will result in I-cache invalidation for the entire block mapping. Signed-off-by: Will Deacon <will@kernel.org> Signed-off-by: Marc Zyngier <maz@kernel.org> Tested-by: Quentin Perret <qperret@google.com> Reviewed-by: Quentin Perret <qperret@google.com> Cc: Marc Zyngier <maz@kernel.org> Cc: <stable@vger.kernel.org> Link: https://lore.kernel.org/r/20200723101714.15873-1-will@kernel.org
* mmap locking API: convert mmap_sem call sites missed by coccinelleMichel Lespinasse2020-06-091-7/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | Convert the last few remaining mmap_sem rwsem calls to use the new mmap locking API. These were missed by coccinelle for some reason (I think coccinelle does not support some of the preprocessor constructs in these files ?) [akpm@linux-foundation.org: convert linux-next leftovers] [akpm@linux-foundation.org: more linux-next leftovers] [akpm@linux-foundation.org: more linux-next leftovers] Signed-off-by: Michel Lespinasse <walken@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Daniel Jordan <daniel.m.jordan@oracle.com> Reviewed-by: Laurent Dufour <ldufour@linux.ibm.com> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Cc: Davidlohr Bueso <dbueso@suse.de> Cc: David Rientjes <rientjes@google.com> Cc: Hugh Dickins <hughd@google.com> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: Jerome Glisse <jglisse@redhat.com> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Liam Howlett <Liam.Howlett@oracle.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ying Han <yinghan@google.com> Link: http://lkml.kernel.org/r/20200520052908.204642-6-walken@google.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* arm64: add support for folded p4d page tablesMike Rapoport2020-06-041-32/+177
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Implement primitives necessary for the 4th level folding, add walks of p4d level where appropriate, replace 5level-fixup.h with pgtable-nop4d.h and remove __ARCH_USE_5LEVEL_HACK. [arnd@arndb.de: fix gcc-10 shift warning] Link: http://lkml.kernel.org/r/20200429185657.4085975-1-arnd@arndb.de Signed-off-by: Mike Rapoport <rppt@linux.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Brian Cain <bcain@codeaurora.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Christophe Leroy <christophe.leroy@c-s.fr> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: Geert Uytterhoeven <geert+renesas@glider.be> Cc: Guan Xuetao <gxt@pku.edu.cn> Cc: James Morse <james.morse@arm.com> Cc: Jonas Bonn <jonas@southpole.se> Cc: Julien Thierry <julien.thierry.kdev@gmail.com> Cc: Ley Foon Tan <ley.foon.tan@intel.com> Cc: Marc Zyngier <maz@kernel.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Paul Mackerras <paulus@samba.org> Cc: Rich Felker <dalias@libc.org> Cc: Russell King <linux@armlinux.org.uk> Cc: Stafford Horne <shorne@gmail.com> Cc: Stefan Kristiansson <stefan.kristiansson@saunalahti.fi> Cc: Suzuki K Poulose <suzuki.poulose@arm.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Will Deacon <will@kernel.org> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Link: http://lkml.kernel.org/r/20200414153455.21744-4-rppt@kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* KVM: arm64: Remove obsolete kvm_virt_to_phys abstractionAndrew Scull2020-05-251-3/+3
| | | | | | | | | | | | | This abstraction was introduced to hide the difference between arm and arm64 but, with the former no longer supported, this abstraction can be removed and the canonical kernel API used directly instead. Signed-off-by: Andrew Scull <ascull@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> CC: Marc Zyngier <maz@kernel.org> CC: James Morse <james.morse@arm.com> CC: Suzuki K Poulose <suzuki.poulose@arm.com> Link: https://lore.kernel.org/r/20200519104036.259917-1-ascull@google.com
* KVM: arm64: Support enabling dirty log gradually in small chunksKeqian Zhu2020-05-161-2/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | There is already support of enabling dirty log gradually in small chunks for x86 in commit 3c9bd4006bfc ("KVM: x86: enable dirty log gradually in small chunks"). This adds support for arm64. x86 still writes protect all huge pages when DIRTY_LOG_INITIALLY_ALL_SET is enabled. However, for arm64, both huge pages and normal pages can be write protected gradually by userspace. Under the Huawei Kunpeng 920 2.6GHz platform, I did some tests on 128G Linux VMs with different page size. The memory pressure is 127G in each case. The time taken of memory_global_dirty_log_start in QEMU is listed below: Page Size Before After Optimization 4K 650ms 1.8ms 2M 4ms 1.8ms 1G 2ms 1.8ms Besides the time reduction, the biggest improvement is that we will minimize the performance side effect (because of dissolving huge pages and marking memslots dirty) on guest after enabling dirty log. Signed-off-by: Keqian Zhu <zhukeqian1@huawei.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20200413122023.52583-1-zhukeqian1@huawei.com