Page doesn't render properly ?

Intel Goldmont Microarchitecture events

This is a list of all Intel Goldmont Microarchitecture performance counter event types. Please see Intel Architecture Developer's Manual Volume 3B, Appendix A and Intel Architecture Optimization Reference Manual (730795-001).

NameDescriptionCounters usableUnit mask options
cpu_clk_unhalted all 0x00: (name=core_p) Core cycles when core is not halted. This event uses a (_P)rogrammable general purpose performance counter.
0x01: (name=ref) Reference cycles when core is not halted. This event uses a (_P)rogrammable general purpose performance counter.
ld_blocks all 0x10: (name=all_block) Counts anytime a load that retires is blocked for any reason.
0x10: (name=all_block_pebs) Counts anytime a load that retires is blocked for any reason.
0x08: (name=utlb_miss) Counts loads blocked because they are unable to find their physical address in the micro TLB (UTLB).
0x08: (name=utlb_miss_pebs) Counts loads blocked because they are unable to find their physical address in the micro TLB (UTLB).
0x02: (name=store_forward) Counts a load blocked from using a store forward because of an address/size mismatch, only one of the loads blocked from each store will be counted.
0x02: (name=store_forward_pebs) Counts a load blocked from using a store forward because of an address/size mismatch, only one of the loads blocked from each store will be counted.
0x01: (name=data_unknown) Counts a load blocked from using a store forward, but did not occur because the store data was not available at the right time. The forward might occur subsequently when the data is available.
0x01: (name=data_unknown_pebs) Counts a load blocked from using a store forward, but did not occur because the store data was not available at the right time. The forward might occur subsequently when the data is available.
0x04: (name=u4k_alias) Counts loads that block because their address modulo 4K matches a pending store.
0x04: (name=u4k_alias_pebs) Counts loads that block because their address modulo 4K matches a pending store.
page_walks all 0x01: (name=d_side_cycles) Counts every core cycle when a Data-side (walks due to a data operation) page walk is in progress.
0x02: (name=i_side_cycles) Counts every core cycle when a Instruction-side (walks due to an instruction fetch) page walk is in progress.
0x03: (name=cycles) Counts every core cycle a page-walk is in progress due to either a data memory operation or an instruction fetch.
uops_issued_any all 0x00: (name=any) Counts uops issued by the front end and allocated into the back end of the machine. This event counts uops that retire as well as uops that were speculatively executed but didn't retire. The sort of speculative uops that might be counted includes, but is not limited to those uops issued in the shadow of a miss-predicted branch, those uops that are inserted during an assist (such as for a denormal floating point result), and (previously allocated) uops that might be canceled during a machine clear.
misalign_mem_ref all 0x02: (name=load_page_split) Counts when a memory load of a uop spans a page boundary (a split) is retired.
0x02: (name=load_page_split_pebs) Counts when a memory load of a uop spans a page boundary (a split) is retired.
0x04: (name=store_page_split) Counts when a memory store of a uop spans a page boundary (a split) is retired.
0x04: (name=store_page_split_pebs) Counts when a memory store of a uop spans a page boundary (a split) is retired.
longest_lat_cache all 0x4f: (name=reference) Counts memory requests originating from the core that reference a cache line in the L2 cache.
0x41: (name=miss) Counts memory requests originating from the core that miss in the L2 cache.
l2_reject_xq_all all 0x00: (name=all) Counts the number of demand and prefetch transactions that the L2 XQ rejects due to a full or near full condition which likely indicates back pressure from the intra-die interconnect (IDI) fabric. The XQ may reject transactions from the L2Q (non-cacheable requests), L2 misses and L2 write-back victims.
core_reject_l2q_all all 0x00: (name=all) Counts the number of demand and L1 prefetcher requests rejected by the L2Q due to a full or nearly full condition which likely indicates back pressure from L2Q. It also counts requests that would have gone directly to the XQ, but are rejected due to a full or nearly full condition, indicating back pressure from the IDI link. The L2Q may also reject transactions from a core to insure fairness between cores, or to delay a core's dirty eviction when the address conflicts with incoming external snoops.
dl1_dirty_eviction all 0x01: (name=dirty_eviction) Counts when a modified (dirty) cache line is evicted from the data L1 cache and needs to be written back to memory. No count will occur if the evicted line is clean, and hence does not require a writeback.
icache all 0x01: (name=hit) Counts requests to the Instruction Cache (ICache) for one or more bytes in an ICache Line and that cache line is in the ICache (hit). The event strives to count on a cache line basis, so that multiple accesses which hit in a single cache line count as one ICACHE.HIT. Specifically, the event counts when straight line code crosses the cache line boundary, or when a branch target is to a new line, and that cache line is in the ICache. This event counts differently than Intel processors based on Silvermont microarchitecture.
0x02: (name=misses) Counts requests to the Instruction Cache (ICache) for one or more bytes in an ICache Line and that cache line is not in the ICache (miss). The event strives to count on a cache line basis, so that multiple accesses which miss in a single cache line count as one ICACHE.MISS. Specifically, the event counts when straight line code crosses the cache line boundary, or when a branch target is to a new line, and that cache line is not in the ICache. This event counts differently than Intel processors based on Silvermont microarchitecture.
0x03: (name=accesses) Counts requests to the Instruction Cache (ICache) for one or more bytes in an ICache Line. The event strives to count on a cache line basis, so that multiple fetches to a single cache line count as one ICACHE.ACCESS. Specifically, the event counts when accesses from straight line code crosses the cache line boundary, or when a branch target is to a new line. This event counts differently than Intel processors based on Silvermont microarchitecture.
itlb_miss all 0x04: (name=miss) Counts the number of times the machine was unable to find a translation in the Instruction Translation Lookaside Buffer (ITLB) for a linear address of an instruction fetch. It counts when new translation are filled into the ITLB. The event is speculative in nature, but will not count translations (page walks) that are begun and not finished, or translations that are finished but not filled into the ITLB.
fetch_stall_icache_fill_pending_cycles all 0x02: (name=icache_fill_pending_cycles) Counts cycles that an ICache miss is outstanding, and instruction fetch is stalled. That is, the decoder queue is able to accept bytes, but the fetch unit is unable to provide bytes, while an Icache miss outstanding. Note this event is not the same as cycles to retrieve an instruction due to an Icache miss. Rather, it is the part of the Instruction Cache (ICache) miss time where no bytes are available for the decoder.
uops_not_delivered_any all 0x00: (name=any) This event used to measure front-end inefficiencies. I.e. when front-end of the machine is not delivering uops to the back-end and the back-end has is not stalled. This event can be used to identify if the machine is truly front-end bound. When this event occurs, it is an indication that the front-end of the machine is operating at less than its theoretical peak performance. Background: We can think of the processor pipeline as being divided into 2 broader parts: Front-end and Back-end. Front-end is responsible for fetching the instruction, decoding into uops in machine understandable format and putting them into a uop queue to be consumed by back end. The back-end then takes these uops, allocates the required resources. When all resources are ready, uops are executed. If the back-end is not ready to accept uops from the front-end, then we do not want to count these as front-end bottlenecks. However, whenever we have bottlenecks in the back-end, we will have
inst_retired all 0x00: (name=any) Counts the number of instructions that retire execution. For instructions that consist of multiple uops, this event counts the retirement of the last uop of the instruction. The counter continues counting during hardware interrupts, traps, and inside interrupt handlers. This event uses fixed counter 0. You cannot collect a PEBs record for this event.
0x00: (name=any_p) Counts the number of instructions that retire execution. For instructions that consist of multiple uops, this event counts the retirement of the last uop of the instruction. The event continues counting during hardware interrupts, traps, and inside interrupt handlers. This is an architectural performance event. This event uses a (_P)rogrammable general purpose performance counter. *This event is Precise Event capable: The EventingRIP field in the PEBS record is precise to the address of the instruction which caused the event. Note: Because PEBS records can be collected only on IA32_PMC0, only one event can use the PEBS facility at a time.
0x00: (name=any_p_pebs) Counts the number of instructions that retire execution. For instructions that consist of multiple uops, this event counts the retirement of the last uop of the instruction. The event continues counting during hardware interrupts, traps, and inside interrupt handlers. This is an architectural performance event. This event uses a (_P)rogrammable general purpose performance counter. *This event is Precise Event capable: The EventingRIP field in the PEBS record is precise to the address of the instruction which caused the event. Note: Because PEBS records can be collected only on IA32_PMC0, only one event can use the PEBS facility at a time.
uops_retired all 0x00: (name=any) Counts uops which retired
0x00: (name=any_pebs) Counts uops which retired
0x01: (name=ms) Counts uops retired that are from the complex flows issued by the micro-sequencer (MS). Counts both the uops from a micro-coded instruction, and the uops that might be generated from a micro-coded assist.
0x01: (name=ms_pebs) Counts uops retired that are from the complex flows issued by the micro-sequencer (MS). Counts both the uops from a micro-coded instruction, and the uops that might be generated from a micro-coded assist.
machine_clears all 0x00: (name=all) Counts machine clears for any reason
0x01: (name=smc) Counts the number of times that the processor detects that a program is writing to a code section and has to perform a machine clear because of that modification. Self-modifying code (SMC) causes a severe penalty in all Intel architecture processors.
0x02: (name=memory_ordering) Counts machine clears due to memory ordering issues. This occurs when a snoop request happens and the machine is uncertain if memory ordering will be preserved - as another core is in the process of modifying the data.
0x04: (name=fp_assist) Counts machine clears due to floating point (FP) operations needing assists. For instance, if the result was a floating point denormal, the hardware clears the pipeline and reissues uops to produce the correct IEEE compliant denormal result.
0x08: (name=disambiguation) Counts machine clears due to memory disambiguation. Memory disambiguation happens when a load which has been issued conflicts with a previous unretired store in the pipeline whose address was not known at issue time, but is later resolved to be the same as the load address.
br_inst_retired all 0x00: (name=all_branches) Counts branch instructions retired for all branch types. This is an architectural performance event.
0x00: (name=all_branches_pebs) Counts branch instructions retired for all branch types. This is an architectural performance event.
0x7e: (name=jcc) Counts retired Jcc (Jump on Conditional Code/Jump if Condition is Met) branch instructions retired, including both when the branch was taken and when it was not taken.
0x7e: (name=jcc_pebs) Counts retired Jcc (Jump on Conditional Code/Jump if Condition is Met) branch instructions retired, including both when the branch was taken and when it was not taken.
0xfe: (name=taken_jcc) Counts Jcc (Jump on Conditional Code/Jump if Condition is Met) branch instructions retired that were taken and does not count when the Jcc branch instruction were not taken.
0xfe: (name=taken_jcc_pebs) Counts Jcc (Jump on Conditional Code/Jump if Condition is Met) branch instructions retired that were taken and does not count when the Jcc branch instruction were not taken.
0xf9: (name=call) Counts near CALL branch instructions retired.
0xf9: (name=call_pebs) Counts near CALL branch instructions retired.
0xfd: (name=rel_call) Counts near relative CALL branch instructions retired.
0xfd: (name=rel_call_pebs) Counts near relative CALL branch instructions retired.
0xfb: (name=ind_call) Counts near indirect CALL branch instructions retired.
0xfb: (name=ind_call_pebs) Counts near indirect CALL branch instructions retired.
0xf7: (name=return) Counts near return branch instructions retired.
0xf7: (name=return_pebs) Counts near return branch instructions retired.
0xeb: (name=non_return_ind) Counts near indirect call or near indirect jmp branch instructions retired.
0xeb: (name=non_return_ind_pebs) Counts near indirect call or near indirect jmp branch instructions retired.
0xbf: (name=far_branch) Counts far branch instructions retired. This includes far jump, far call and return, and Interrupt call and return.
0xbf: (name=far_branch_pebs) Counts far branch instructions retired. This includes far jump, far call and return, and Interrupt call and return.
br_misp_retired all 0x00: (name=all_branches) Counts mispredicted branch instructions retired including all branch types.
0x00: (name=all_branches_pebs) Counts mispredicted branch instructions retired including all branch types.
0x7e: (name=jcc) Counts mispredicted retired Jcc (Jump on Conditional Code/Jump if Condition is Met) branch instructions retired, including both when the branch was supposed to be taken and when it was not supposed to be taken (but the processor predicted the opposite condition).
0x7e: (name=jcc_pebs) Counts mispredicted retired Jcc (Jump on Conditional Code/Jump if Condition is Met) branch instructions retired, including both when the branch was supposed to be taken and when it was not supposed to be taken (but the processor predicted the opposite condition).
0xfe: (name=taken_jcc) Counts mispredicted retired Jcc (Jump on Conditional Code/Jump if Condition is Met) branch instructions retired that were supposed to be taken but the processor predicted that it would not be taken.
0xfe: (name=taken_jcc_pebs) Counts mispredicted retired Jcc (Jump on Conditional Code/Jump if Condition is Met) branch instructions retired that were supposed to be taken but the processor predicted that it would not be taken.
0xfb: (name=ind_call) Counts mispredicted near indirect CALL branch instructions retired, where the target address taken was not what the processor predicted.
0xfb: (name=ind_call_pebs) counts mispredicted near indirect CALL branch instructions retired, where the target address taken was not what the processor predicted.
0xf7: (name=return) Counts mispredicted near RET branch instructions retired, where the return address taken was not what the processor predicted.
0xf7: (name=return_pebs) Counts mispredicted near RET branch instructions retired, where the return address taken was not what the processor predicted.
0xeb: (name=non_return_ind) Counts mispredicted branch instructions retired that were near indirect call or near indirect jmp, where the target address taken was not what the processor predicted.
0xeb: (name=non_return_ind_pebs) Counts mispredicted branch instructions retired that were near indirect call or near indirect jmp, where the target address taken was not what the processor predicted.
issue_slots_not_consumed all 0x00: (name=any) Counts the number of issue slots per core cycle that were not consumed by the backend due to either a full resource in the backend (RESOURCE_FULL) or due to the processor recovering from some event (RECOVERY)
0x01: (name=resource_full) Counts the number of issue slots per core cycle that were not consumed because of a full resource in the backend. Including but not limited to resources such as the Re-order Buffer (ROB), reservation stations (RS), load/store buffers, physical registers, or any other needed machine resource that is currently unavailable. Note that uops must be available for consumption in order for this event to fire. If a uop is not available (Instruction Queue is empty), this event will not count.
0x02: (name=recovery) Counts the number of issue slots per core cycle that were not consumed by the backend because allocation is stalled waiting for a mispredicted jump to retire or other branch-like conditions (e.g. the event is relevant during certain microcode flows). Counts all issue slots blocked while within this window including slots where uops were not available in the Instruction Queue.
hw_interrupts all 0x01: (name=received) Counts hardware interrupts received by the processor.
0x04: (name=pending_and_masked) Counts core cycles during which there are pending interrupts, but interrupts are masked (EFLAGS.IF = 0).
cycles_div_busy all 0x00: (name=all) Counts core cycles if either divide unit is busy.
0x01: (name=idiv) Counts core cycles the integer divide unit is busy.
0x02: (name=fpdiv) Counts core cycles the floating point divide unit is busy.
mem_uops_retired all 0x83: (name=all) Counts the number of memory uops retired that is either a loads or a store or both.
0x81: (name=all_loads) Counts the number of load uops retired
0x81: (name=all_loads_pebs) Counts the number of load uops retired
0x82: (name=all_stores) Counts the number of store uops retired.
0x82: (name=all_stores_pebs) Counts the number of store uops retired.
0x83: (name=all_pebs) Counts the number of memory uops retired that is either a loads or a store or both.
0x11: (name=dtlb_miss_loads) Counts load uops retired that caused a DTLB miss.
0x11: (name=dtlb_miss_loads_pebs) Counts load uops retired that caused a DTLB miss.
0x12: (name=dtlb_miss_stores) Counts store uops retired that caused a DTLB miss.
0x12: (name=dtlb_miss_stores_pebs) Counts store uops retired that caused a DTLB miss.
0x13: (name=dtlb_miss) Counts uops retired that had a DTLB miss on load, store or either. Note that when two distinct memory operations to the same page miss the DTLB, only one of them will be recorded as a DTLB miss.
0x13: (name=dtlb_miss_pebs) Counts uops retired that had a DTLB miss on load, store or either. Note that when two distinct memory operations to the same page miss the DTLB, only one of them will be recorded as a DTLB miss.
0x21: (name=lock_loads) Counts locked memory uops retired. This includes "regular" locks and bus locks. (To specifically count bus locks only, see the Offcore response event.) A locked access is one with a lock prefix, or an exchange to memory. See the SDM for a complete description of which memory load accesses are locks.
0x21: (name=lock_loads_pebs) Counts locked memory uops retired. This includes "regular" locks and bus locks. (To specifically count bus locks only, see the Offcore response event.) A locked access is one with a lock prefix, or an exchange to memory. See the SDM for a complete description of which memory load accesses are locks.
0x41: (name=split_loads) Counts load uops retired where the data requested spans a 64 byte cache line boundary.
0x41: (name=split_loads_pebs) Counts load uops retired where the data requested spans a 64 byte cache line boundary.
0x42: (name=split_stores) Counts store uops retired where the data requested spans a 64 byte cache line boundary.
0x42: (name=split_stores_pebs) Counts store uops retired where the data requested spans a 64 byte cache line boundary.
0x43: (name=split) Counts memory uops retired where the data requested spans a 64 byte cache line boundary.
0x43: (name=split_pebs) Counts memory uops retired where the data requested spans a 64 byte cache line boundary.
mem_load_uops_retired all 0x01: (name=l1_hit) Counts load uops retired that hit the L1 data cache.
0x01: (name=l1_hit_pebs) Counts load uops retired that hit the L1 data cache.
0x08: (name=l1_miss) Counts load uops retired that miss the L1 data cache.
0x08: (name=l1_miss_pebs) Counts load uops retired that miss the L1 data cache.
0x02: (name=l2_hit) Counts load uops retired that hit in the L2 cache.
0x02: (name=l2_hit_pebs) Counts load uops retired that hit in the L2 cache.
0x10: (name=l2_miss) Counts load uops retired that miss in the L2 cache.
0x10: (name=l2_miss_pebs) Counts load uops retired that miss in the L2 cache.
0x20: (name=hitm) Counts load uops retired where the cache line containing the data was in the modified state of another core or modules cache (HITM). More specifically, this means that when the load address was checked by other caching agents (typically another processor) in the system, one of those caching agents indicated that they had a dirty copy of the data. Loads that obtain a HITM response incur greater latency than most is typical for a load. In addition, since HITM indicates that some other processor had this data in its cache, it implies that the data was shared between processors, or potentially was a lock or semaphore value. This event is useful for locating sharing, false sharing, and contended locks.
0x20: (name=hitm_pebs) Counts load uops retired where the cache line containing the data was in the modified state of another core or modules cache (HITM). More specifically, this means that when the load address was checked by other caching agents (typically another processor) in the system, one of those caching agents indicated that they had a dirty copy of the data. Loads that obtain a HITM response incur greater latency than most is typical for a load. In addition, since HITM indicates that some other processor had this data in its cache, it implies that the data was shared between processors, or potentially was a lock or semaphore value. This event is useful for locating sharing, false sharing, and contended locks.
0x40: (name=wcb_hit) Counts memory load uops retired where the data is retrieved from the WCB (or fill buffer), indicating that the load found its data while that data was in the process of being brought into the L1 cache. Typically a load will receive this indication when some other load or prefetch missed the L1 cache and was in the process of retrieving the cache line containing the data, but that process had not yet finished (and written the data back to the cache). For example, consider load X and Y, both referencing the same cache line that is not in the L1 cache. If load X misses cache first, it obtains and WCB (or fill buffer) and begins the process of requesting the data. When load Y requests the data, it will either hit the WCB, or the L1 cache, depending on exactly what time the request to Y occurs.
0x40: (name=wcb_hit_pebs) Counts memory load uops retired where the data is retrieved from the WCB (or fill buffer), indicating that the load found its data while that data was in the process of being brought into the L1 cache. Typically a load will receive this indication when some other load or prefetch missed the L1 cache and was in the process of retrieving the cache line containing the data, but that process had not yet finished (and written the data back to the cache). For example, consider load X and Y, both referencing the same cache line that is not in the L1 cache. If load X misses cache first, it obtains and WCB (or fill buffer) and begins the process of requesting the data. When load Y requests the data, it will either hit the WCB, or the L1 cache, depending on exactly what time the request to Y occurs.
0x80: (name=dram_hit) Counts memory load uops retired where the data is retrieved from DRAM. Event is counted at retirement, so the speculative loads are ignored. A memory load can hit (or miss) the L1 cache, hit (or miss) the L2 cache, hit DRAM, hit in the WCB or receive a HITM response.
0x80: (name=dram_hit_pebs) Counts memory load uops retired where the data is retrieved from DRAM. Event is counted at retirement, so the speculative loads are ignored. A memory load can hit (or miss) the L1 cache, hit (or miss) the L2 cache, hit DRAM, hit in the WCB or receive a HITM response.
baclears all 0x01: (name=all) Counts the number of times a BACLEAR is signaled for any reason, including, but not limited to indirect branch/call, Jcc (Jump on Conditional Code/Jump if Condition is Met) branch, unconditional branch/call, and returns.
0x08: (name=return) Counts BACLEARS on return instructions.
0x10: (name=cond) Counts BACLEARS on Jcc (Jump on Conditional Code/Jump if Condition is Met) branches.
ms_decoded_ms_entry all 0x01: (name=ms_entry) Counts the number of times the Microcode Sequencer (MS) starts a flow of uops from the MSROM. It does not count every time a uop is read from the MSROM. The most common case that this counts is when a micro-coded instruction is encountered by the front end of the machine. Other cases include when an instruction encounters a fault, trap, or microcode assist of any sort that initiates a flow of uops. The event will count MS startups for uops that are speculative, and subsequently cleared by branch mispredict or a machine clear.
decode_restriction_predecode_wrong all 0x01: (name=predecode_wrong) Counts the number of times the prediction (from the predecode cache) for instruction length is incorrect.
Bottlenecks occur in surprising places, so don't try to second guess and put in a speed hack until you've proven that's where the bottleneck is. - Rob Pike
2017/07/24