kvm: explicitly set FOLL_HONOR_NUMA_FAULT in hva_to_pfn_slow()

KVM is *the* case we know that really wants to honor NUMA hinting falls. As we want to stop setting FOLL_HONOR_NUMA_FAULT implicitly, set FOLL_HONOR_NUMA_FAULT whenever we might obtain pages on behalf of a VCPU to map them into a secondary MMU, and add a comment why. Do that unconditionally in hva_to_pfn_slow() when calling get_user_pages_unlocked(). kvmppc_book3s_instantiate_page(), hva_to_pfn_fast() and gfn_to_page_many_atomic() are similarly used to map pages into a secondary MMU. However, FOLL_WRITE and get_user_page_fast_only() always implicitly honor NUMA hinting faults -- as documented for FOLL_HONOR_NUMA_FAULT -- so we can limit this change to a single location for now. Don't set it in check_user_page_hwpoison(), where we really only want to check if the mapped page is HW-poisoned. We won't set it for other KVM users of get_user_pages()/pin_user_pages() * arch/powerpc/kvm/book3s_64_mmu_hv.c: not used to map pages into a secondary MMU. * arch/powerpc/kvm/e500_mmu.c: only used on shared TLB pages with userspace * arch/s390/kvm/*: s390x only supports a single NUMA node either way * arch/x86/kvm/svm/sev.c: not used to map pages into a secondary MMU. This is a preparation for making FOLL_HONOR_NUMA_FAULT no longer implicitly be set by get_user_pages() and friends. Link: https://lkml.kernel.org/r/20230803143208.383663-4-david@redhat.com Signed-off-by: David Hildenbrand <david@redhat.com> Cc: Hugh Dickins <hughd@google.com> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: liubo <liubo254@huawei.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Mel Gorman <mgorman@suse.de> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Peter Xu <peterx@redhat.com> Cc: Shuah Khan <shuah@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
author: David Hildenbrand <david@redhat.com> 2023-08-03 16:32:04 +0200
committer: Andrew Morton <akpm@linux-foundation.org> 2023-08-21 14:28:41 -0700
commit: b1e1296d7c6a3520b97add2394361660d193a5ea (patch)
tree: 0fe431021d09f73ff158b26c41ee1c19c9b490a1 /virt
parent: 5994eabf3bbbea550166ae90de0c854fc984c95d (diff)
download: linux-b1e1296d7c6a3520b97add2394361660d193a5ea.tar.gz
linux-b1e1296d7c6a3520b97add2394361660d193a5ea.tar.bz2
linux-b1e1296d7c6a3520b97add2394361660d193a5ea.zip
1 files changed, 12 insertions, 1 deletions
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 5bbb5612b207..2500178cf444 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -2517,7 +2517,18 @@ static bool hva_to_pfn_fast(unsigned long addr, bool write_fault,
 static int hva_to_pfn_slow(unsigned long addr, bool *async, bool write_fault,
 			   bool interruptible, bool *writable, kvm_pfn_t *pfn)
 {
-	unsigned int flags = FOLL_HWPOISON;
+	/*
+	 * When a VCPU accesses a page that is not mapped into the secondary
+	 * MMU, we lookup the page using GUP to map it, so the guest VCPU can
+	 * make progress. We always want to honor NUMA hinting faults in that
+	 * case, because GUP usage corresponds to memory accesses from the VCPU.
+	 * Otherwise, we'd not trigger NUMA hinting faults once a page is
+	 * mapped into the secondary MMU and gets accessed by a VCPU.
+	 *
+	 * Note that get_user_page_fast_only() and FOLL_WRITE for now
+	 * implicitly honor NUMA hinting faults and don't need this flag.
+	 */
+	unsigned int flags = FOLL_HWPOISON | FOLL_HONOR_NUMA_FAULT;
 	struct page *page;
 	int npages;
author	David Hildenbrand <david@redhat.com>	2023-08-03 16:32:04 +0200
committer	Andrew Morton <akpm@linux-foundation.org>	2023-08-21 14:28:41 -0700
commit	b1e1296d7c6a3520b97add2394361660d193a5ea (patch)
tree	0fe431021d09f73ff158b26c41ee1c19c9b490a1 /virt
parent	5994eabf3bbbea550166ae90de0c854fc984c95d (diff)
download	linux-b1e1296d7c6a3520b97add2394361660d193a5ea.tar.gz linux-b1e1296d7c6a3520b97add2394361660d193a5ea.tar.bz2 linux-b1e1296d7c6a3520b97add2394361660d193a5ea.zip