summaryrefslogtreecommitdiffstats
path: root/include
diff options
context:
space:
mode:
authorDavid Matlack <dmatlack@google.com>2021-08-04 22:28:40 +0000
committerPaolo Bonzini <pbonzini@redhat.com>2021-08-06 07:52:29 -0400
commitfe22ed827c5b60b895b15c5c3f04e04ac606be38 (patch)
treee53cb7787c35face88d10895d54307d72724bdae /include
parent0f22af940dc8ec4f437189096a5f8677995323b0 (diff)
downloadlinux-fe22ed827c5b60b895b15c5c3f04e04ac606be38.tar.gz
linux-fe22ed827c5b60b895b15c5c3f04e04ac606be38.tar.bz2
linux-fe22ed827c5b60b895b15c5c3f04e04ac606be38.zip
KVM: Cache the last used slot index per vCPU
The memslot for a given gfn is looked up multiple times during page fault handling. Avoid binary searching for it multiple times by caching the most recently used slot. There is an existing VM-wide last_used_slot but that does not work well for cases where vCPUs are accessing memory in different slots (see performance data below). Another benefit of caching the most recently use slot (versus looking up the slot once and passing around a pointer) is speeding up memslot lookups *across* faults and during spte prefetching. To measure the performance of this change I ran dirty_log_perf_test with 64 vCPUs and 64 memslots and measured "Populate memory time" and "Iteration 2 dirty memory time". Tests were ran with eptad=N to force dirty logging to use fast_page_fault so its performance could be measured. Config | Metric | Before | After ---------- | ----------------------------- | ------ | ------ tdp_mmu=Y | Populate memory time | 6.76s | 5.47s tdp_mmu=Y | Iteration 2 dirty memory time | 2.83s | 0.31s tdp_mmu=N | Populate memory time | 20.4s | 18.7s tdp_mmu=N | Iteration 2 dirty memory time | 2.65s | 0.30s The "Iteration 2 dirty memory time" results are especially compelling because they are equivalent to running the same test with a single memslot. In other words, fast_page_fault performance no longer scales with the number of memslots. Signed-off-by: David Matlack <dmatlack@google.com> Message-Id: <20210804222844.1419481-4-dmatlack@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Diffstat (limited to 'include')
-rw-r--r--include/linux/kvm_host.h13
1 files changed, 13 insertions, 0 deletions
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index f30b53a07917..492d183dd7d0 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -354,6 +354,13 @@ struct kvm_vcpu {
struct kvm_vcpu_stat stat;
char stats_id[KVM_STATS_NAME_SIZE];
struct kvm_dirty_ring dirty_ring;
+
+ /*
+ * The index of the most recently used memslot by this vCPU. It's ok
+ * if this becomes stale due to memslot changes since we always check
+ * it is a valid slot.
+ */
+ int last_used_slot;
};
/* must be called with irqs disabled */
@@ -1200,6 +1207,12 @@ try_get_memslot(struct kvm_memslots *slots, int slot_index, gfn_t gfn)
if (slot_index < 0 || slot_index >= slots->used_slots)
return NULL;
+ /*
+ * slot_index can come from vcpu->last_used_slot which is not kept
+ * in sync with userspace-controllable memslot deletion. So use nospec
+ * to prevent the CPU from speculating past the end of memslots[].
+ */
+ slot_index = array_index_nospec(slot_index, slots->used_slots);
slot = &slots->memslots[slot_index];
if (gfn >= slot->base_gfn && gfn < slot->base_gfn + slot->npages)