summaryrefslogtreecommitdiffstats
path: root/tools/testing
diff options
context:
space:
mode:
authorDapeng Mi <dapeng1.mi@linux.intel.com>2024-02-18 12:30:03 +0800
committerSean Christopherson <seanjc@google.com>2024-02-21 08:03:02 -0800
commit4a447b135e45b49101417d54079df25b520299d9 (patch)
treeb2d5a2460de602a791086623a39f9f57cfa45f27 /tools/testing
parent83bdfe04c968e0fe3181e4cd41b764e17ac73911 (diff)
downloadlinux-stable-4a447b135e45b49101417d54079df25b520299d9.tar.gz
linux-stable-4a447b135e45b49101417d54079df25b520299d9.tar.bz2
linux-stable-4a447b135e45b49101417d54079df25b520299d9.zip
KVM: selftests: Test top-down slots event in x86's pmu_counters_test
Although the fixed counter 3 and its exclusive pseudo slots event are not supported by KVM yet, the architectural slots event is supported by KVM and can be programmed on any GP counter. Thus add validation for this architectural slots event. Top-down slots event "counts the total number of available slots for an unhalted logical processor, and increments by machine-width of the narrowest pipeline as employed by the Top-down Microarchitecture Analysis method." As for the slot, it's an abstract concept which indicates how many uops (decoded from instructions) can be processed simultaneously (per cycle) on HW. In Top-down Microarchitecture Analysis (TMA) method, the processor is divided into two parts, frond-end and back-end. Assume there is a processor with classic 5-stage pipeline, fetch, decode, execute, memory access and register writeback. The former 2 stages (fetch/decode) are classified to frond-end and the latter 3 stages are classified to back-end. In modern Intel processors, a complicated instruction would be decoded into several uops (micro-operations) and so these uops can be processed simultaneously and then improve the performance. Thus, assume a processor can decode and dispatch 4 uops in front-end and execute 4 uops in back-end simultaneously (per-cycle), so the machine-width of this processor is 4 and this processor has 4 topdown slots per-cycle. If a slot is spare and can be used to process a new upcoming uop, then the slot is available, but if a uop occupies a slot for several cycles and can't be retired (maybe blocked by memory access), then this slot is stall and unavailable. Considering the testing instruction sequence can't be macro-fused on x86 platforms, the measured slots count should not be less than NUM_INSNS_RETIRED. Thus assert the slots count against NUM_INSNS_RETIRED. pmu_counters_test passed with this patch on Intel Sapphire Rapids. About the more information about TMA method, please refer the below link. https://www.intel.com/content/www/us/en/docs/vtune-profiler/cookbook/2023-0/top-down-microarchitecture-analysis-method.html Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com> Link: https://lore.kernel.org/r/20240218043003.2424683-1-dapeng1.mi@linux.intel.com Signed-off-by: Sean Christopherson <seanjc@google.com>
Diffstat (limited to 'tools/testing')
-rw-r--r--tools/testing/selftests/kvm/x86_64/pmu_counters_test.c3
1 files changed, 3 insertions, 0 deletions
diff --git a/tools/testing/selftests/kvm/x86_64/pmu_counters_test.c b/tools/testing/selftests/kvm/x86_64/pmu_counters_test.c
index ae5f6042f1e8..29609b52f8fa 100644
--- a/tools/testing/selftests/kvm/x86_64/pmu_counters_test.c
+++ b/tools/testing/selftests/kvm/x86_64/pmu_counters_test.c
@@ -119,6 +119,9 @@ static void guest_assert_event_count(uint8_t idx,
case INTEL_ARCH_REFERENCE_CYCLES_INDEX:
GUEST_ASSERT_NE(count, 0);
break;
+ case INTEL_ARCH_TOPDOWN_SLOTS_INDEX:
+ GUEST_ASSERT(count >= NUM_INSNS_RETIRED);
+ break;
default:
break;
}