summaryrefslogtreecommitdiffstats
path: root/tools/perf/pmu-events/arch/x86/meteorlake
diff options
context:
space:
mode:
authorLinus Torvalds <torvalds@linux-foundation.org>2023-11-03 08:17:38 -1000
committerLinus Torvalds <torvalds@linux-foundation.org>2023-11-03 08:17:38 -1000
commit7ab89417ed235f56d84c7893d38d4905e38d2692 (patch)
tree0980734f4e492a09e68d820fedce20465c69e3df /tools/perf/pmu-events/arch/x86/meteorlake
parent31e5f934ff962820995c82a6953176a1c7d18ff5 (diff)
parentfed3a1be6433e15833068c701bfde7b422d8b988 (diff)
downloadlinux-stable-7ab89417ed235f56d84c7893d38d4905e38d2692.tar.gz
linux-stable-7ab89417ed235f56d84c7893d38d4905e38d2692.tar.bz2
linux-stable-7ab89417ed235f56d84c7893d38d4905e38d2692.zip
Merge tag 'perf-tools-for-v6.7-1-2023-11-01' of git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools
Pull perf tools updates from Namhyung Kim: "Build: - Compile BPF programs by default if clang (>= 12.0.1) is available to enable more features like kernel lock contention, off-cpu profiling, kwork, sample filtering and so on. This can be disabled by passing BUILD_BPF_SKEL=0 to make. - Produce better error messages for bison on debug build (make DEBUG=1) by defining YYDEBUG symbol internally. perf record: - Track sideband events (like FORK/MMAP) from all CPUs even if perf record targets a subset of CPUs only (using -C option). Otherwise it may lose some information happened on a CPU out of the target list. - Fix checking raw sched_switch tracepoint argument using system BTF. This affects off-cpu profiling which attaches a BPF program to the raw tracepoint. perf lock contention: - Add --lock-cgroup option to see contention by cgroups. This should be used with BPF only (using -b option). $ sudo perf lock con -ab --lock-cgroup -- sleep 1 contended total wait max wait avg wait cgroup 835 14.06 ms 41.19 us 16.83 us /system.slice/led.service 25 122.38 us 13.77 us 4.89 us / 44 23.73 us 3.87 us 539 ns /user.slice/user-657345.slice/session-c4.scope 1 491 ns 491 ns 491 ns /system.slice/connectd.service - Add -G/--cgroup-filter option to see contention only for given cgroups. This can be useful when you identified a cgroup in the above command and want to investigate more on it. It also works with other output options like -t/--threads and -l/--lock-addr. $ sudo perf lock con -ab -G /user.slice/user-657345.slice/session-c4.scope -- sleep 1 contended total wait max wait avg wait type caller 8 77.11 us 17.98 us 9.64 us spinlock futex_wake+0xc8 2 24.56 us 14.66 us 12.28 us spinlock tick_do_update_jiffies64+0x25 1 4.97 us 4.97 us 4.97 us spinlock futex_q_lock+0x2a - Use per-cpu array for better spinlock tracking. This is to improve performance of the BPF program and to avoid nested contention on a lock in the BPF hash map. - Update callstack check for PowerPC. To find a representative caller of a lock, it needs to look up the call stacks. It ends the lookup when it sees 0 in the call stack buffer. However, PowerPC call stacks can have 0 values in the beginning so skip them when it expects valid call stacks after. perf kwork: - Support 'sched' class (for -k option) so that it can see task scheduling event (using sched_switch tracepoint) as well as irq and workqueue items. - Add perf kwork top subcommand to show more accurate cpu utilization with sched class above. It works both with a recorded data (using perf kwork record command) and BPF (using -b option). Unlike perf top command, it does not support interactive mode (yet). $ sudo perf kwork top -b -k sched Starting trace, Hit <Ctrl+C> to stop and report ^C Total : 160702.425 ms, 8 cpus %Cpu(s): 36.00% id, 0.00% hi, 0.00% si %Cpu0 [|||||||||||||||||| 61.66%] %Cpu1 [|||||||||||||||||| 61.27%] %Cpu2 [||||||||||||||||||| 66.40%] %Cpu3 [|||||||||||||||||| 61.28%] %Cpu4 [|||||||||||||||||| 61.82%] %Cpu5 [||||||||||||||||||||||| 77.41%] %Cpu6 [|||||||||||||||||| 61.73%] %Cpu7 [|||||||||||||||||| 63.25%] PID SPID %CPU RUNTIME COMMMAND ------------------------------------------------------------- 0 0 38.72 8089.463 ms [swapper/1] 0 0 38.71 8084.547 ms [swapper/3] 0 0 38.33 8007.532 ms [swapper/0] 0 0 38.26 7992.985 ms [swapper/6] 0 0 38.17 7971.865 ms [swapper/4] 0 0 36.74 7447.765 ms [swapper/7] 0 0 33.59 6486.942 ms [swapper/2] 0 0 22.58 3771.268 ms [swapper/5] 9545 9351 2.48 447.136 ms sched-messaging 9574 9351 2.09 418.583 ms sched-messaging 9724 9351 2.05 372.407 ms sched-messaging 9531 9351 2.01 368.804 ms sched-messaging 9512 9351 2.00 362.250 ms sched-messaging 9514 9351 1.95 357.767 ms sched-messaging 9538 9351 1.86 384.476 ms sched-messaging 9712 9351 1.84 386.490 ms sched-messaging 9723 9351 1.83 380.021 ms sched-messaging 9722 9351 1.82 382.738 ms sched-messaging 9517 9351 1.81 354.794 ms sched-messaging 9559 9351 1.79 344.305 ms sched-messaging 9725 9351 1.77 365.315 ms sched-messaging <SNIP> - Add hard/soft-irq statistics to perf kwork top. This will show the total CPU utilization with IRQ stats like below: $ sudo perf kwork top -b -k sched,irq,softirq Starting trace, Hit <Ctrl+C> to stop and report ^C Total : 12554.889 ms, 8 cpus %Cpu(s): 96.23% id, 0.10% hi, 0.19% si <---- here %Cpu0 [| 4.60%] %Cpu1 [| 4.59%] %Cpu2 [ 2.73%] %Cpu3 [| 3.81%] <SNIP> perf bench: - Add -G/--cgroups option to perf bench sched pipe. The pipe bench is good to measure context switch overhead. With this option, it puts the reader and writer tasks in separate cgroups to enforce context switch between two different cgroups. Also it needs to set CPU affinity of the tasks in a CPU to accurately measure the impact of cgroup context switches. $ sudo perf stat -e context-switches,cgroup-switches -- \ > taskset -c 0 perf bench sched pipe -l 100000 # Running 'sched/pipe' benchmark: # Executed 100000 pipe operations between two processes Total time: 0.307 [sec] 3.078180 usecs/op 324867 ops/sec Performance counter stats for 'taskset -c 0 perf bench sched pipe -l 100000': 200,026 context-switches 63 cgroup-switches 0.321637922 seconds time elapsed You can see small number of cgroup-switches because both write and read tasks are in the same cgroup. $ sudo mkdir /sys/fs/cgroup/{AAA,BBB} $ sudo perf stat -e context-switches,cgroup-switches -- \ > taskset -c 0 perf bench sched pipe -l 100000 -G AAA,BBB # Running 'sched/pipe' benchmark: # Executed 100000 pipe operations between two processes Total time: 0.351 [sec] 3.512990 usecs/op 284657 ops/sec Performance counter stats for 'taskset -c 0 perf bench sched pipe -l 100000 -G AAA,BBB': 200,020 context-switches 200,019 cgroup-switches 0.365034567 seconds time elapsed Now context-switches and cgroup-switches are almost same. And you can see the pipe operation took little more. - Kill child processes when perf bench sched messaging exited abnormally. Otherwise it'd leave the child doing unnecessary work. perf test: - Fix various shellcheck issues on the tests written in shell script. - Skip tests when condition is not satisfied: - object code reading test for non-text section addresses. - CoreSight test if cs_etm// event is not available. - lock contention test if not enough CPUs. Event parsing: - Make PMU alias name loading lazy to reduce the startup time in the event parsing code for perf record, stat and others in the general case. - Lazily compute PMU default config. In the same sense, delay PMU initialization until it's really needed to reduce the startup cost. - Fix event term values that are raw events. The event specification can have several terms including event name. But sometimes it clashes with raw event encoding which starts with 'r' and has hex-digits. For example, an event named 'read' should be processed as a normal event but it was mis-treated as a raw encoding and caused a failure. $ perf stat -e 'uncore_imc_free_running/event=read/' -a sleep 1 event syntax error: '..nning/event=read/' \___ parser error Run 'perf list' for a list of valid events Usage: perf stat [<options>] [<command>] -e, --event <event> event selector. use 'perf list' to list available events Event metrics: - Add "Compat" regex to match event with multiple identifiers. - Usual updates for Intel, Power10, Arm telemetry/CMN and AmpereOne. Misc: - Assorted memory leak fixes and footprint reduction. - Add "bpf_skeletons" to perf version --build-options so that users can check whether their perf tools have BPF support easily. - Fix unaligned access in Intel-PT packet decoder found by undefined-behavior sanitizer. - Avoid frequency mode for the dummy event. Surprisingly it'd impact kernel timer tick handler performance by force iterating all PMU events. - Update bash shell completion for events and metrics" * tag 'perf-tools-for-v6.7-1-2023-11-01' of git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools: (187 commits) perf vendor events intel: Update tsx_cycles_per_elision metrics perf vendor events intel: Update bonnell version number to v5 perf vendor events intel: Update westmereex events to v4 perf vendor events intel: Update meteorlake events to v1.06 perf vendor events intel: Update knightslanding events to v16 perf vendor events intel: Add typo fix for ivybridge FP perf vendor events intel: Update a spelling in haswell/haswellx perf vendor events intel: Update emeraldrapids to v1.01 perf vendor events intel: Update alderlake/alderlake events to v1.23 perf build: Disable BPF skeletons if clang version is < 12.0.1 perf callchain: Fix spelling mistake "statisitcs" -> "statistics" perf report: Fix spelling mistake "heirachy" -> "hierarchy" perf python: Fix binding linkage due to rename and move of evsel__increase_rlimit() perf tests: test_arm_coresight: Simplify source iteration perf vendor events intel: Add tigerlake two metrics perf vendor events intel: Add broadwellde two metrics perf vendor events intel: Fix broadwellde tma_info_system_dram_bw_use metric perf mem_info: Add and use map_symbol__exit and addr_map_symbol__exit perf callchain: Minor layout changes to callchain_list perf callchain: Make brtype_stat in callchain_list optional ...
Diffstat (limited to 'tools/perf/pmu-events/arch/x86/meteorlake')
-rw-r--r--tools/perf/pmu-events/arch/x86/meteorlake/cache.json30
-rw-r--r--tools/perf/pmu-events/arch/x86/meteorlake/frontend.json29
-rw-r--r--tools/perf/pmu-events/arch/x86/meteorlake/memory.json37
-rw-r--r--tools/perf/pmu-events/arch/x86/meteorlake/other.json40
-rw-r--r--tools/perf/pmu-events/arch/x86/meteorlake/pipeline.json68
-rw-r--r--tools/perf/pmu-events/arch/x86/meteorlake/uncore-other.json9
6 files changed, 208 insertions, 5 deletions
diff --git a/tools/perf/pmu-events/arch/x86/meteorlake/cache.json b/tools/perf/pmu-events/arch/x86/meteorlake/cache.json
index 1de0200b32f6..5fef87502d4b 100644
--- a/tools/perf/pmu-events/arch/x86/meteorlake/cache.json
+++ b/tools/perf/pmu-events/arch/x86/meteorlake/cache.json
@@ -967,6 +967,16 @@
"Unit": "cpu_core"
},
{
+ "BriefDescription": "Counts demand data reads that were supplied by the L3 cache where a snoop was sent, the snoop hit, and modified data was forwarded.",
+ "EventCode": "0xB7",
+ "EventName": "OCR.DEMAND_DATA_RD.L3_HIT.SNOOP_HITM",
+ "MSRIndex": "0x1a6,0x1a7",
+ "MSRValue": "0x10003C0001",
+ "SampleAfterValue": "100003",
+ "UMask": "0x1",
+ "Unit": "cpu_atom"
+ },
+ {
"BriefDescription": "Counts demand data reads that resulted in a snoop hit in another cores caches, data forwarding is required as the data is modified.",
"EventCode": "0x2A,0x2B",
"EventName": "OCR.DEMAND_DATA_RD.L3_HIT.SNOOP_HITM",
@@ -977,6 +987,16 @@
"Unit": "cpu_core"
},
{
+ "BriefDescription": "Counts demand data reads that were supplied by the L3 cache where a snoop was sent, the snoop hit, and non-modified data was forwarded.",
+ "EventCode": "0xB7",
+ "EventName": "OCR.DEMAND_DATA_RD.L3_HIT.SNOOP_HIT_WITH_FWD",
+ "MSRIndex": "0x1a6,0x1a7",
+ "MSRValue": "0x8003C0001",
+ "SampleAfterValue": "100003",
+ "UMask": "0x1",
+ "Unit": "cpu_atom"
+ },
+ {
"BriefDescription": "Counts demand data reads that resulted in a snoop hit in another cores caches which forwarded the unmodified data to the requesting core.",
"EventCode": "0x2A,0x2B",
"EventName": "OCR.DEMAND_DATA_RD.L3_HIT.SNOOP_HIT_WITH_FWD",
@@ -987,6 +1007,16 @@
"Unit": "cpu_core"
},
{
+ "BriefDescription": "Counts demand reads for ownership (RFO) and software prefetches for exclusive ownership (PREFETCHW) that were supplied by the L3 cache where a snoop was sent, the snoop hit, and modified data was forwarded.",
+ "EventCode": "0xB7",
+ "EventName": "OCR.DEMAND_RFO.L3_HIT.SNOOP_HITM",
+ "MSRIndex": "0x1a6,0x1a7",
+ "MSRValue": "0x10003C0002",
+ "SampleAfterValue": "100003",
+ "UMask": "0x1",
+ "Unit": "cpu_atom"
+ },
+ {
"BriefDescription": "Counts demand read for ownership (RFO) requests and software prefetches for exclusive ownership (PREFETCHW) that resulted in a snoop hit in another cores caches, data forwarding is required as the data is modified.",
"EventCode": "0x2A,0x2B",
"EventName": "OCR.DEMAND_RFO.L3_HIT.SNOOP_HITM",
diff --git a/tools/perf/pmu-events/arch/x86/meteorlake/frontend.json b/tools/perf/pmu-events/arch/x86/meteorlake/frontend.json
index 8264419500a5..9da8689eda81 100644
--- a/tools/perf/pmu-events/arch/x86/meteorlake/frontend.json
+++ b/tools/perf/pmu-events/arch/x86/meteorlake/frontend.json
@@ -461,6 +461,27 @@
"Unit": "cpu_core"
},
{
+ "BriefDescription": "Cycles when no uops are not delivered by the IDQ when backend of the machine is not stalled [This event is alias to IDQ_UOPS_NOT_DELIVERED.CYCLES_0_UOPS_DELIV.CORE]",
+ "CounterMask": "6",
+ "EventCode": "0x9c",
+ "EventName": "IDQ_BUBBLES.CYCLES_0_UOPS_DELIV.CORE",
+ "PublicDescription": "Counts the number of cycles when no uops were delivered by the Instruction Decode Queue (IDQ) to the back-end of the pipeline when there was no back-end stalls. This event counts for one SMT thread in a given cycle. [This event is alias to IDQ_UOPS_NOT_DELIVERED.CYCLES_0_UOPS_DELIV.CORE]",
+ "SampleAfterValue": "1000003",
+ "UMask": "0x1",
+ "Unit": "cpu_core"
+ },
+ {
+ "BriefDescription": "Cycles when optimal number of uops was delivered to the back-end when the back-end is not stalled [This event is alias to IDQ_UOPS_NOT_DELIVERED.CYCLES_FE_WAS_OK]",
+ "CounterMask": "1",
+ "EventCode": "0x9c",
+ "EventName": "IDQ_BUBBLES.CYCLES_FE_WAS_OK",
+ "Invert": "1",
+ "PublicDescription": "Counts the number of cycles when the optimal number of uops were delivered by the Instruction Decode Queue (IDQ) to the back-end of the pipeline when there was no back-end stalls. This event counts for one SMT thread in a given cycle. [This event is alias to IDQ_UOPS_NOT_DELIVERED.CYCLES_FE_WAS_OK]",
+ "SampleAfterValue": "1000003",
+ "UMask": "0x1",
+ "Unit": "cpu_core"
+ },
+ {
"BriefDescription": "Uops not delivered by IDQ when backend of the machine is not stalled",
"EventCode": "0x9c",
"EventName": "IDQ_UOPS_NOT_DELIVERED.CORE",
@@ -470,22 +491,22 @@
"Unit": "cpu_core"
},
{
- "BriefDescription": "Cycles when no uops are not delivered by the IDQ when backend of the machine is not stalled",
+ "BriefDescription": "Cycles when no uops are not delivered by the IDQ when backend of the machine is not stalled [This event is alias to IDQ_BUBBLES.CYCLES_0_UOPS_DELIV.CORE]",
"CounterMask": "6",
"EventCode": "0x9c",
"EventName": "IDQ_UOPS_NOT_DELIVERED.CYCLES_0_UOPS_DELIV.CORE",
- "PublicDescription": "Counts the number of cycles when no uops were delivered by the Instruction Decode Queue (IDQ) to the back-end of the pipeline when there was no back-end stalls. This event counts for one SMT thread in a given cycle.",
+ "PublicDescription": "Counts the number of cycles when no uops were delivered by the Instruction Decode Queue (IDQ) to the back-end of the pipeline when there was no back-end stalls. This event counts for one SMT thread in a given cycle. [This event is alias to IDQ_BUBBLES.CYCLES_0_UOPS_DELIV.CORE]",
"SampleAfterValue": "1000003",
"UMask": "0x1",
"Unit": "cpu_core"
},
{
- "BriefDescription": "Cycles when optimal number of uops was delivered to the back-end when the back-end is not stalled",
+ "BriefDescription": "Cycles when optimal number of uops was delivered to the back-end when the back-end is not stalled [This event is alias to IDQ_BUBBLES.CYCLES_FE_WAS_OK]",
"CounterMask": "1",
"EventCode": "0x9c",
"EventName": "IDQ_UOPS_NOT_DELIVERED.CYCLES_FE_WAS_OK",
"Invert": "1",
- "PublicDescription": "Counts the number of cycles when the optimal number of uops were delivered by the Instruction Decode Queue (IDQ) to the back-end of the pipeline when there was no back-end stalls. This event counts for one SMT thread in a given cycle.",
+ "PublicDescription": "Counts the number of cycles when the optimal number of uops were delivered by the Instruction Decode Queue (IDQ) to the back-end of the pipeline when there was no back-end stalls. This event counts for one SMT thread in a given cycle. [This event is alias to IDQ_BUBBLES.CYCLES_FE_WAS_OK]",
"SampleAfterValue": "1000003",
"UMask": "0x1",
"Unit": "cpu_core"
diff --git a/tools/perf/pmu-events/arch/x86/meteorlake/memory.json b/tools/perf/pmu-events/arch/x86/meteorlake/memory.json
index 2605e1d0ba9f..a5b83293f157 100644
--- a/tools/perf/pmu-events/arch/x86/meteorlake/memory.json
+++ b/tools/perf/pmu-events/arch/x86/meteorlake/memory.json
@@ -67,6 +67,14 @@
"Unit": "cpu_atom"
},
{
+ "BriefDescription": "Counts the number of machine clears due to memory ordering caused by a snoop from an external agent. Does not count internally generated machine clears such as those due to memory disambiguation.",
+ "EventCode": "0xc3",
+ "EventName": "MACHINE_CLEARS.MEMORY_ORDERING",
+ "SampleAfterValue": "20003",
+ "UMask": "0x2",
+ "Unit": "cpu_atom"
+ },
+ {
"BriefDescription": "Number of machine clears due to memory ordering conflicts.",
"EventCode": "0xc3",
"EventName": "MACHINE_CLEARS.MEMORY_ORDERING",
@@ -76,6 +84,15 @@
"Unit": "cpu_core"
},
{
+ "BriefDescription": "Cycles while L1 cache miss demand load is outstanding.",
+ "CounterMask": "2",
+ "EventCode": "0x47",
+ "EventName": "MEMORY_ACTIVITY.CYCLES_L1D_MISS",
+ "SampleAfterValue": "1000003",
+ "UMask": "0x2",
+ "Unit": "cpu_core"
+ },
+ {
"BriefDescription": "Execution stalls while L1 cache miss demand load is outstanding.",
"CounterMask": "3",
"EventCode": "0x47",
@@ -280,6 +297,26 @@
"Unit": "cpu_atom"
},
{
+ "BriefDescription": "Counts demand data reads that were not supplied by the L3 cache.",
+ "EventCode": "0x2A,0x2B",
+ "EventName": "OCR.DEMAND_DATA_RD.L3_MISS",
+ "MSRIndex": "0x1a6,0x1a7",
+ "MSRValue": "0x3FBFC00001",
+ "SampleAfterValue": "100003",
+ "UMask": "0x1",
+ "Unit": "cpu_core"
+ },
+ {
+ "BriefDescription": "Counts demand read for ownership (RFO) requests and software prefetches for exclusive ownership (PREFETCHW) that were not supplied by the L3 cache.",
+ "EventCode": "0x2A,0x2B",
+ "EventName": "OCR.DEMAND_RFO.L3_MISS",
+ "MSRIndex": "0x1a6,0x1a7",
+ "MSRValue": "0x3FBFC00002",
+ "SampleAfterValue": "100003",
+ "UMask": "0x1",
+ "Unit": "cpu_core"
+ },
+ {
"BriefDescription": "Counts demand data read requests that miss the L3 cache.",
"EventCode": "0x21",
"EventName": "OFFCORE_REQUESTS.L3_MISS_DEMAND_DATA_RD",
diff --git a/tools/perf/pmu-events/arch/x86/meteorlake/other.json b/tools/perf/pmu-events/arch/x86/meteorlake/other.json
index f4c603599df4..d55e792c0c43 100644
--- a/tools/perf/pmu-events/arch/x86/meteorlake/other.json
+++ b/tools/perf/pmu-events/arch/x86/meteorlake/other.json
@@ -8,6 +8,46 @@
"Unit": "cpu_core"
},
{
+ "BriefDescription": "Counts demand data reads that have any type of response.",
+ "EventCode": "0x2A,0x2B",
+ "EventName": "OCR.DEMAND_DATA_RD.ANY_RESPONSE",
+ "MSRIndex": "0x1a6,0x1a7",
+ "MSRValue": "0x10001",
+ "SampleAfterValue": "100003",
+ "UMask": "0x1",
+ "Unit": "cpu_core"
+ },
+ {
+ "BriefDescription": "Counts demand data reads that were supplied by DRAM.",
+ "EventCode": "0x2A,0x2B",
+ "EventName": "OCR.DEMAND_DATA_RD.DRAM",
+ "MSRIndex": "0x1a6,0x1a7",
+ "MSRValue": "0x184000001",
+ "SampleAfterValue": "100003",
+ "UMask": "0x1",
+ "Unit": "cpu_core"
+ },
+ {
+ "BriefDescription": "Counts demand read for ownership (RFO) requests and software prefetches for exclusive ownership (PREFETCHW) that have any type of response.",
+ "EventCode": "0x2A,0x2B",
+ "EventName": "OCR.DEMAND_RFO.ANY_RESPONSE",
+ "MSRIndex": "0x1a6,0x1a7",
+ "MSRValue": "0x10002",
+ "SampleAfterValue": "100003",
+ "UMask": "0x1",
+ "Unit": "cpu_core"
+ },
+ {
+ "BriefDescription": "Counts streaming stores that have any type of response.",
+ "EventCode": "0xB7",
+ "EventName": "OCR.STREAMING_WR.ANY_RESPONSE",
+ "MSRIndex": "0x1a6,0x1a7",
+ "MSRValue": "0x10800",
+ "SampleAfterValue": "100003",
+ "UMask": "0x1",
+ "Unit": "cpu_atom"
+ },
+ {
"BriefDescription": "Counts streaming stores that have any type of response.",
"EventCode": "0x2A,0x2B",
"EventName": "OCR.STREAMING_WR.ANY_RESPONSE",
diff --git a/tools/perf/pmu-events/arch/x86/meteorlake/pipeline.json b/tools/perf/pmu-events/arch/x86/meteorlake/pipeline.json
index 352c5efafc06..deaa7aba93f7 100644
--- a/tools/perf/pmu-events/arch/x86/meteorlake/pipeline.json
+++ b/tools/perf/pmu-events/arch/x86/meteorlake/pipeline.json
@@ -218,6 +218,15 @@
"Unit": "cpu_core"
},
{
+ "BriefDescription": "Counts the number of mispredicted taken JCC (Jump on Conditional Code) branch instructions retired.",
+ "EventCode": "0xc5",
+ "EventName": "BR_MISP_RETIRED.COND_TAKEN",
+ "PEBS": "1",
+ "SampleAfterValue": "200003",
+ "UMask": "0xfe",
+ "Unit": "cpu_atom"
+ },
+ {
"BriefDescription": "number of branch instructions retired that were mispredicted and taken.",
"EventCode": "0xc5",
"EventName": "BR_MISP_RETIRED.COND_TAKEN",
@@ -293,6 +302,15 @@
"Unit": "cpu_core"
},
{
+ "BriefDescription": "Counts the number of mispredicted near taken branch instructions retired.",
+ "EventCode": "0xc5",
+ "EventName": "BR_MISP_RETIRED.NEAR_TAKEN",
+ "PEBS": "1",
+ "SampleAfterValue": "200003",
+ "UMask": "0x80",
+ "Unit": "cpu_atom"
+ },
+ {
"BriefDescription": "Number of near branch instructions retired that were mispredicted and taken.",
"EventCode": "0xc5",
"EventName": "BR_MISP_RETIRED.NEAR_TAKEN",
@@ -733,7 +751,7 @@
"Unit": "cpu_core"
},
{
- "BriefDescription": "INT_MISC.UNKNOWN_BRANCH_CYCLES",
+ "BriefDescription": "Bubble cycles of BAClear (Unknown Branch).",
"EventCode": "0xad",
"EventName": "INT_MISC.UNKNOWN_BRANCH_CYCLES",
"MSRIndex": "0x3F7",
@@ -1046,6 +1064,14 @@
"Unit": "cpu_atom"
},
{
+ "BriefDescription": "Counts the number of issue slots every cycle that were not consumed by the backend due to Branch Mispredict",
+ "EventCode": "0x73",
+ "EventName": "TOPDOWN_BAD_SPECULATION.MISPREDICT",
+ "SampleAfterValue": "1000003",
+ "UMask": "0x4",
+ "Unit": "cpu_atom"
+ },
+ {
"BriefDescription": "Counts the number of issue slots every cycle that were not consumed by the backend due to a machine clear (nuke).",
"EventCode": "0x73",
"EventName": "TOPDOWN_BAD_SPECULATION.NUKE",
@@ -1069,6 +1095,22 @@
"Unit": "cpu_atom"
},
{
+ "BriefDescription": "Counts the number of issue slots every cycle that were not consumed by the backend due to memory reservation stall (scheduler not being able to accept another uop). This could be caused by RSV full or load/store buffer block.",
+ "EventCode": "0x74",
+ "EventName": "TOPDOWN_BE_BOUND.MEM_SCHEDULER",
+ "SampleAfterValue": "1000003",
+ "UMask": "0x2",
+ "Unit": "cpu_atom"
+ },
+ {
+ "BriefDescription": "Counts the number of issue slots every cycle that were not consumed by the backend due to IEC and FPC RAT stalls - which can be due to the FIQ and IEC reservation station stall (integer, FP and SIMD scheduler not being able to accept another uop. )",
+ "EventCode": "0x74",
+ "EventName": "TOPDOWN_BE_BOUND.NON_MEM_SCHEDULER",
+ "SampleAfterValue": "1000003",
+ "UMask": "0x8",
+ "Unit": "cpu_atom"
+ },
+ {
"BriefDescription": "Counts the number of issue slots every cycle that were not consumed by the backend due to mrbl stall. A 'marble' refers to a physical register file entry, also known as the physical destination (PDST).",
"EventCode": "0x74",
"EventName": "TOPDOWN_BE_BOUND.REGISTER",
@@ -1077,6 +1119,14 @@
"Unit": "cpu_atom"
},
{
+ "BriefDescription": "Counts the number of issue slots every cycle that were not consumed by the backend due to ROB full",
+ "EventCode": "0x74",
+ "EventName": "TOPDOWN_BE_BOUND.REORDER_BUFFER",
+ "SampleAfterValue": "1000003",
+ "UMask": "0x40",
+ "Unit": "cpu_atom"
+ },
+ {
"BriefDescription": "Counts the number of issue slots every cycle that were not consumed by the backend due to iq/jeu scoreboards or ms scb",
"EventCode": "0x74",
"EventName": "TOPDOWN_BE_BOUND.SERIALIZATION",
@@ -1157,6 +1207,14 @@
"Unit": "cpu_atom"
},
{
+ "BriefDescription": "Counts the number of issue slots every cycle that were not delivered by the frontend that do not categorize into any other common frontend stall",
+ "EventCode": "0x71",
+ "EventName": "TOPDOWN_FE_BOUND.OTHER",
+ "SampleAfterValue": "1000003",
+ "UMask": "0x80",
+ "Unit": "cpu_atom"
+ },
+ {
"BriefDescription": "Counts the number of issue slots every cycle that were not delivered by the frontend due to predecode wrong",
"EventCode": "0x71",
"EventName": "TOPDOWN_FE_BOUND.PREDECODE",
@@ -1399,6 +1457,14 @@
"Unit": "cpu_core"
},
{
+ "BriefDescription": "Counts the total number of uops retired.",
+ "EventCode": "0xc2",
+ "EventName": "UOPS_RETIRED.ALL",
+ "PEBS": "1",
+ "SampleAfterValue": "2000003",
+ "Unit": "cpu_atom"
+ },
+ {
"BriefDescription": "Cycles with retired uop(s).",
"CounterMask": "1",
"EventCode": "0xc2",
diff --git a/tools/perf/pmu-events/arch/x86/meteorlake/uncore-other.json b/tools/perf/pmu-events/arch/x86/meteorlake/uncore-other.json
new file mode 100644
index 000000000000..2af92e43b28a
--- /dev/null
+++ b/tools/perf/pmu-events/arch/x86/meteorlake/uncore-other.json
@@ -0,0 +1,9 @@
+[
+ {
+ "BriefDescription": "This 48-bit fixed counter counts the UCLK cycles.",
+ "EventCode": "0xff",
+ "EventName": "UNC_CLOCK.SOCKET",
+ "PerPkg": "1",
+ "Unit": "CLOCK"
+ }
+]