This is a list of all Intel Ivy Bridge Microarchitecture performance counter event types. Please see Intel Architecture Developer's Manual Volume 3B, Appendix A and Intel Architecture Optimization Reference Manual (730795-001).
| Name | Description | Counters usable | Unit mask options |
| CPU_CLK_UNHALTED | Clock cycles when not halted | all | |
| UNHALTED_REFERENCE_CYCLES | Unhalted reference cycles | all |
0x01: No unit mask
|
| INST_RETIRED | number of instructions retired | all | |
| LLC_MISSES | Last level cache demand requests from this core that missed the LLC | all |
0x41: No unit mask
|
| LLC_REFS | Last level cache demand requests from this core | all |
0x4f: No unit mask
|
| BR_INST_RETIRED | number of branch instructions retired | all | |
| BR_MISS_PRED_RETIRED | number of mispredicted branches retired (precise) | all | |
| ld_blocks | Blocked loads | all |
0x02: store_forward loads blocked by overlapping with store buffer that cannot be forwarded
|
| misalign_mem_ref | Misaligned memory references | all |
0x01: loads Speculative cache line split load uops dispatched to L1 cache
0x02: stores Speculative cache line split STA uops dispatched to L1 cache |
| ld_blocks_partial | Partial loads | all |
0x01: address_alias False dependencies in MOB due to partial compare on address
|
| dtlb_load_misses | D-TLB misses | all |
0x81: demand_ld_miss_causes_a_walk Demand load Miss in all translation lookaside buffer (TLB) levels causes an page walk of any page size.
0x82: demand_ld_walk_completed Demand load Miss in all translation lookaside buffer (TLB) levels causes a page walk that completes of any page size. 0x84: demand_ld_walk_duration Demand load cycles page miss handler (PMH) is busy with this walk. |
| int_misc | Instruction decoder events | all |
0x03: recovery_cycles Number of cycles waiting for the checkpoints in Resource Allocation Table (RAT) to be recovered after Nuke due to all other cases except JEClear (e.g. whenever a ucode assist is needed like SSE exception, memory disambiguation, etc...) (extra: cmask=1)
0x03: recovery_stalls_count Number of occurences waiting for the checkpoints in Resource Allocation Table (RAT) to be recovered after Nuke due to all other cases except JEClear (e.g. whenever a ucode assist is needed like SSE exception, memory disambiguation, etc...) (extra: edge cmask=1) |
| uops_issued | Uops issued | 0, 1, 2, 3 |
0x01: any Uops that Resource Allocation Table (RAT) issues to Reservation Station (RS)
0x01: stall_cycles Cycles when Resource Allocation Table (RAT) does not issue Uops to Reservation Station (RS) for the thread (extra: inv cmask=1) 0x01: core_stall_cycles Cycles when Resource Allocation Table (RAT) does not issue Uops to Reservation Station (RS) for all threads (extra: inv cmask=1) 0x10: flags_merge Number of flags-merge uops being allocated. 0x20: slow_lea Number of slow LEA uops being allocated. A uop is generally considered SlowLea if it has 3 sources (e.g. 2 sources + immediate) regardless if as a result of LEA instruction or not. 0x40: single_mul Number of Multiply packed/scalar single precision uops allocated |
| arith | Arithmetic | all |
0x01: fpu_div_active Cycles when divider is busy executing divide operations
0x04: fpu_div Divide operations executed (extra: edge cmask=1) |
| l2_rqsts | L2 cache requests | all |
0x01: demand_data_rd_hit Demand Data Read requests that hit L2 cache
0x03: all_demand_data_rd Demand Data Read requests 0x04: rfo_hit RFO requests that hit L2 cache 0x08: rfo_miss RFO requests that miss L2 cache 0x0c: all_rfo RFO requests to L2 cache 0x10: code_rd_hit L2 cache hits when fetching instructions, code reads. 0x20: code_rd_miss L2 cache misses when fetching instructions 0x30: all_code_rd L2 code requests 0x40: pf_hit Requests from the L2 hardware prefetchers that hit L2 cache 0x80: pf_miss Requests from the L2 hardware prefetchers that miss L2 cache 0xc0: all_pf Requests from L2 hardware prefetchers |
| l2_store_lock_rqsts | L2 cache store lock requests | all |
0x01: miss RFOs that miss cache lines
0x08: hit_m RFOs that hit cache lines in M state 0x0f: all RFOs that access cache lines in any state |
| l2_l1d_wb_rqsts | writebacks from L1D to the L2 cache | all |
0x01: miss Count the number of modified Lines evicted from L1 and missed L2. (Non-rejected WBs from the DCU.)
0x04: hit_e Not rejected writebacks from L1D to L2 cache lines in E state 0x08: hit_m Not rejected writebacks from L1D to L2 cache lines in M state 0x0f: all Not rejected writebacks from L1D to L2 cache lines in any state. |
| l1d_pend_miss | L1D miss oustandings | 2 |
0x01: pending L1D miss oustandings duration in cycles
0x01: pending_cycles Cycles with L1D load Misses outstanding. (extra: cmask=1) 0x01: occurences This event counts the number of L1D misses outstanding, using an edge detect to count transitions. (extra: edge cmask=1) |
| dtlb_store_misses | Store misses in all DTLB levels that cause page walks | all |
0x01: miss_causes_a_walk Store misses in all DTLB levels that cause page walks
0x02: walk_completed Store misses in all DTLB levels that cause completed page walks 0x04: walk_duration Cycles when PMH is busy with page walks 0x10: stlb_hit Store operations that miss the first TLB level but hit the second and do not cause page walks |
| load_hit_pre | Load dispatches that hit fill buffer | all |
0x01: sw_pf Not software-prefetch load dispatches that hit forward buffer allocated for software prefetch
0x02: hw_pf Not software-prefetch load dispatches that hit forward buffer allocated for hardware prefetch |
| l1d | L1D data line replacements | all |
0x01: replacement L1D data line replacements
|
| move_elimination | Integer move elimination | all |
0x01: int_not_eliminated Number of integer Move Elimination candidate uops that were not eliminated.
0x02: simd_not_eliminated Number of SIMD Move Elimination candidate uops that were not eliminated. 0x04: int_eliminated Number of integer Move Elimination candidate uops that were eliminated. 0x08: simd_eliminated Number of SIMD Move Elimination candidate uops that were eliminated. |
| cpl_cycles | Unhalted core cycles qualified by ring | all |
0x01: ring0 Unhalted core cycles when the thread is in ring 0
0x01: ring0_trans Number of intervals between processor halts while thread is in ring 0 (extra: edge cmask=1) 0x02: ring123 Unhalted core cycles when thread is in rings 1, 2, or 3 |
| rs_events | Reservation station | all |
0x01: empty_cycles Cycles when Reservation Station (RS) is empty for the thread
|
| tlb_access | TLB access | all |
0x04: load_stlb_hit Load operations that miss the first DTLB level but hit the second and do not cause page walks
|
| offcore_requests_outstanding | Offcore outstanding transactions | all |
0x01: demand_data_rd Offcore outstanding Demand Data Read transactions in uncore queue.
0x01: cycles_with_demand_data_rd Cycles when offcore outstanding Demand Data Read transactions are present in SuperQueue (SQ), queue to uncore (extra: cmask=1) 0x02: demand_code_rd Offcore outstanding code reads transactions in SuperQueue (SQ), queue to uncore, every cycle 0x04: demand_rfo Offcore outstanding RFO store transactions in SuperQueue (SQ), queue to uncore 0x04: cycles_with_demand_rfo Offcore outstanding demand rfo reads transactions in SuperQueue (SQ), queue to uncore, every cycle (extra: cmask=1) 0x08: all_data_rd Offcore outstanding cacheable Core Data Read transactions in SuperQueue (SQ), queue to uncore 0x08: cycles_with_data_rd Cycles when offcore outstanding cacheable Core Data Read transactions are present in SuperQueue (SQ), queue to uncore (extra: cmask=1) |
| lock_cycles | Locked cycles | all |
0x01: split_lock_uc_lock_duration Cycles when L1 and L2 are locked due to UC or split lock
0x02: cache_lock_duration Cycles when L1D is locked |
| idq | Instruction Decode Queue (IDQ) | 0, 1, 2, 3 |
0x02: empty Instruction Decode Queue (IDQ) empty cycles
0x04: mite_uops Uops delivered to Instruction Decode Queue (IDQ) from MITE path 0x04: mite_cycles Cycles when uops are being delivered to Instruction Decode Queue (IDQ) from MITE path (extra: cmask=1) 0x08: dsb_uops Uops delivered to Instruction Decode Queue (IDQ) from the Decode Stream Buffer (DSB) path 0x08: dsb_cycles Cycles when uops are being delivered to Instruction Decode Queue (IDQ) from Decode Stream Buffer (DSB) path (extra: cmask=1) 0x10: ms_dsb_uops Uops initiated by Decode Stream Buffer (DSB) that are being delivered to Instruction Decode Queue (IDQ) while Microcode Sequenser (MS) is busy 0x10: ms_dsb_cycles Cycles when uops initiated by Decode Stream Buffer (DSB) are being delivered to Instruction Decode Queue (IDQ) while Microcode Sequenser (MS) is busy (extra: cmask=1) 0x10: ms_dsb_occur Deliveries to Instruction Decode Queue (IDQ) initiated by Decode Stream Buffer (DSB) while Microcode Sequenser (MS) is busy (extra: edge cmask=1) 0x18: all_dsb_cycles_any_uops Cycles Decode Stream Buffer (DSB) is delivering any Uop (extra: cmask=1) 0x18: all_dsb_cycles_4_uops Cycles Decode Stream Buffer (DSB) is delivering 4 Uops (extra: cmask=4) 0x20: ms_mite_uops Uops initiated by MITE and delivered to Instruction Decode Queue (IDQ) while Microcode Sequenser (MS) is busy 0x24: all_mite_cycles_any_uops Cycles MITE is delivering any Uop (extra: cmask=1) 0x24: all_mite_cycles_4_uops Cycles MITE is delivering 4 Uops (extra: cmask=4) 0x30: ms_uops Uops delivered to Instruction Decode Queue (IDQ) while Microcode Sequenser (MS) is busy 0x30: ms_cycles Cycles when uops are being delivered to Instruction Decode Queue (IDQ) while Microcode Sequenser (MS) is busy (extra: cmask=1) 0x3c: mite_all_uops Uops delivered to Instruction Decode Queue (IDQ) from MITE path |
| icache | Instruction cache | all |
0x02: misses Instruction cache, streaming buffer and victim cache misses
|
| itlb_misses | Misses at all ITLB levels that cause page walks | all |
0x01: miss_causes_a_walk Misses at all ITLB levels that cause page walks
0x02: walk_completed Misses in all ITLB levels that cause completed page walks 0x04: walk_duration Cycles when PMH is busy with page walks 0x10: stlb_hit Operations that miss the first ITLB level but hit the second and do not cause any page walks |
| ild_stall | Stalls caused by changing prefix length of the instruction. | all |
0x01: lcp Stalls caused by changing prefix length of the instruction.
0x04: iq_full Stall cycles because IQ is full |
| br_inst_exec | Branch instructions | all |
0x41: nontaken_conditional Not taken macro-conditional branches
0x81: taken_conditional Taken speculative and retired macro-conditional branches 0x82: taken_direct_jump Taken speculative and retired macro-conditional branch instructions excluding calls and indirects 0x84: taken_indirect_jump_non_call_ret Taken speculative and retired indirect branches excluding calls and returns 0x88: taken_indirect_near_return Taken speculative and retired indirect branches with return mnemonic 0x90: taken_direct_near_call Taken speculative and retired direct near calls 0xa0: taken_indirect_near_call Taken speculative and retired indirect calls 0xc1: all_conditional Speculative and retired macro-conditional branches 0xc2: all_direct_jmp Speculative and retired macro-unconditional branches excluding calls and indirects 0xc4: all_indirect_jump_non_call_ret Speculative and retired indirect branches excluding calls and returns 0xc8: all_indirect_near_return Speculative and retired indirect return branches. 0xd0: all_direct_near_call Speculative and retired direct near calls 0xff: all_branches Speculative and retired branches |
| br_misp_exec | Mispredicted branch instructions | all |
0x41: nontaken_conditional Not taken speculative and retired mispredicted macro conditional branches
0x81: taken_conditional Taken speculative and retired mispredicted macro conditional branches 0x84: taken_indirect_jump_non_call_ret Taken speculative and retired mispredicted indirect branches excluding calls and returns 0x88: taken_return_near Taken speculative and retired mispredicted indirect branches with return mnemonic 0xa0: taken_indirect_near_call Taken speculative and retired mispredicted indirect calls 0xc1: all_conditional Speculative and retired mispredicted macro conditional branches 0xc4: all_indirect_jump_non_call_ret Mispredicted indirect branches excluding calls and returns 0xff: all_branches Speculative and retired mispredicted macro conditional branches |
| idq_uops_not_delivered | Uops not delivered by the Frontend to the Backend of the machine, while there is no Backend stall | all |
0x01: core Uops not delivered by the Frontend to the Backend of the machine, while there is no Backend stall
0x01: cycles_le_3_uop_deliv.core Cycles with 3 or less uops delivered by the Frontend to the Backend of the machine, while there is no Backend stall (extra: cmask=1) 0x01: cycles_fe_was_ok Cycles with 4 uops delivered by the Frontend to the Backend of the machine, or the Backend was stalling (extra: inv cmask=1) 0x01: cycles_le_2_uop_deliv.core Cycles with 2 or less uops delivered by the Frontend to the Backend of the machine, while there is no Backend stall (extra: cmask=2) 0x01: cycles_le_1_uop_deliv.core Cycles with 1 or less uops delivered by the Frontend to the Backend of the machine, while there is no Backend stall (extra: cmask=3) 0x01: cycles_0_uops_deliv.core Cycles with no uops delivered by the Frontend to the Backend of the machine, while there is no Backend stall (extra: cmask=4) |
| uops_dispatched_port | Cycles per thread when uops are dispatched to port | all |
0x01: port_0 Cycles per thread when uops are dispatched to port 0
0x01: port_0_core Cycles per core when uops are dispatched to port 0 (extra:) 0x02: port_1 Cycles per thread when uops are dispatched to port 1 0x02: port_1_core Cycles per core when uops are dispatched to port 1 (extra:) 0x0c: port_2 Cycles per thread when load or STA uops are dispatched to port 2 0x0c: port_2_core Cycles per core when load or STA uops are dispatched to port 2 (extra:) 0x30: port_3 Cycles per thread when load or STA uops are dispatched to port 3 0x30: port_3_core Cycles per core when load or STA uops are dispatched to port 3 (extra:) 0x40: port_4 Cycles per thread when uops are dispatched to port 4 0x40: port_4_core Cycles per core when uops are dispatched to port 4 (extra:) 0x80: port_5 Cycles per thread when uops are dispatched to port 5 0x80: port_5_core Cycles per core when uops are dispatched to port 5 (extra:) |
| resource_stalls | Resource-related stall cycles | all |
0x01: any Resource-related stall cycles
0x04: rs Cycles stalled due to no eligible RS entry available. 0x08: sb Cycles stalled due to no store buffers available. (not including draining form sync). 0x10: rob Cycles stalled due to re-order buffer full. |
| cycle_activity | Cycle activity | 2 |
0x01: cycles_l2_pending Cycles with pending L2 cache miss loads. (extra: cmask=1)
0x02: cycles_ldm_pending Cycles with pending memory loads. (extra: cmask=2) 0x04: cycles_no_execute Total execution stalls (extra: cmask=4) 0x05: stalls_l2_pending Execution stalls due to L2 cache misses. (extra: cmask=5) 0x06: stalls_ldm_pending Execution stalls due to memory subsystem. (extra: cmask=6) 0x08: cycles_l1d_pending Cycles with pending L1 cache miss loads. (extra: cmask=8) 0x0c: stalls_l1d_pending Execution stalls due to L1 data cache misses (extra: cmask=c) |
| dsb2mite_switches | Decode Stream Buffer (DSB)-to-MITE switches | all |
0x01: count Decode Stream Buffer (DSB)-to-MITE switches
|
| dsb_fill | Decode Stream Buffer (DSB) fill | all |
0x08: exceed_dsb_lines Cycles when Decode Stream Buffer (DSB) fill encounter more than 3 Decode Stream Buffer (DSB) lines
|
| itlb | Instruction TLB (ITLB) | all |
0x01: itlb_flush Flushing of the Instruction TLB (ITLB) pages, includes 4k/2M/4M pages.
|
| offcore_requests | Uncore requests | all |
0x01: demand_data_rd Demand Data Read requests sent to uncore
0x02: demand_code_rd Cacheable and noncachaeble code read requests 0x04: demand_rfo Demand RFO requests including regular RFOs, locks, ItoM 0x08: all_data_rd Demand and prefetch data reads |
| uops_executed | Uops executed | 0, 1, 2, 3 |
0x01: thread Counts the number of uops to be executed per-thread each cycle.
0x01: cycles_ge_1_uop_exec Cycles where at least 1 uop was executed per-thread (extra: cmask=1) 0x01: stall_cycles Counts number of cycles no uops were dispatched to be executed on this thread. (extra: inv cmask=1) 0x01: cycles_ge_2_uops_exec Cycles where at least 2 uops were executed per-thread (extra: cmask=2) 0x01: cycles_ge_3_uops_exec Cycles where at least 3 uops were executed per-thread (extra: cmask=3) 0x01: cycles_ge_4_uops_exec Cycles where at least 4 uops were executed per-thread (extra: cmask=4) 0x02: core Number of uops executed on the core. |
| tlb_flush | DTLB flushes | all |
0x01: dtlb_thread DTLB flush attempts of the thread-specific entries
0x20: stlb_any STLB flush attempts |
| other_assists | Microcode assists. | all |
0x08: avx_store Number of AVX memory assist for stores. AVX microcode assist is being invoked whenever the hardware is unable to properly handle AVX-256b operations.
0x10: avx_to_sse Number of transitions from AVX-256 to legacy SSE when penalty applicable. 0x20: sse_to_avx Number of transitions from SSE to AVX-256 when penalty applicable. |
| uops_retired | Retired uops. | 0, 1, 2, 3 |
0x01: all Actually retired uops.
0x01: stall_cycles Cycles without actually retired uops. (extra: inv cmask=1) 0x01: core_stall_cycles Cycles without actually retired uops. (extra: inv cmask=1) 0x01: total_cycles Cycles with less than 10 actually retired uops. (extra: inv cmask=10) 0x02: retire_slots Retirement slots used. |
| machine_clears | Machine clears | all |
0x02: memory_ordering Counts the number of machine clears due to memory order conflicts.
0x04: smc Self-modifying code (SMC) detected. 0x20: maskmov This event counts the number of executed Intel AVX masked load operations that refer to an illegal address range with the mask bits set to 0. |
| br_inst_retired | Conditional branch instructions retired. | all |
0x01: conditional Conditional branch instructions retired.
0x02: near_call_r3 Direct and indirect macro near call instructions retired (captured in ring 3). 0x02: near_call Direct and indirect near call instructions retired. 0x08: near_return Return instructions retired. 0x10: not_taken Not taken branch instructions retired. 0x20: near_taken Taken branch instructions retired. 0x40: far_branch Far branch instructions retired. |
| br_misp_retired | Mispredicted conditional branch instructions retired. | all |
0x01: conditional Mispredicted conditional branch instructions retired.
0x20: near_taken number of near branch instructions retired that were mispredicted and taken. |
| fp_assist | FPU assists | 0, 1, 2, 3 |
0x02: x87_output Number of X87 assists due to output value.
0x04: x87_input Number of X87 assists due to input value. 0x08: simd_output Number of SIMD FP assists due to Output values 0x10: simd_input Number of SIMD FP assists due to input values 0x1e: any Cycles with any input/output SSE or FP assist (extra: cmask=1) |
| rob_misc_events | ROB (Register Order Buffer) events | all |
0x20: lbr_inserts Count cases of saving new LBR
|
| mem_uops_retired | Memory Uops | 0, 1, 2, 3 |
0x11: stlb_miss_loads Load uops with true STLB miss retired to architected path.
0x12: stlb_miss_stores Store uops with true STLB miss retired to architected path. 0x21: lock_loads Load uops with locked access retired to architected path. 0x41: split_loads Line-splitted load uops retired to architected path. 0x42: split_stores Line-splitted store uops retired to architected path. 0x81: all_loads Load uops retired to architected path with filter on bits 0 and 1 applied. 0x82: all_stores Store uops retired to architected path with filter on bits 0 and 1 applied. |
| mem_load_uops_retired | Memory load uops | 0, 1, 2, 3 |
0x01: l1_hit Retired load uops with L1 cache hits as data sources.
0x02: l2_hit Retired load uops with L2 cache hits as data sources. 0x04: llc_hit Retired load uops which data sources were data hits in LLC without snoops required. 0x40: hit_lfb Retired load uops which data sources were load uops missed L1 but hit forward buffer due to preceding miss to the same cache line with data not ready. |
| mem_load_uops_llc_hit_retired | Memory load uops with LLC (Last Level Cache) hit | 0, 1, 2, 3 |
0x01: xsnp_miss Retired load uops which data sources were LLC hit and cross-core snoop missed in on-pkg core cache.
0x02: xsnp_hit Retired load uops which data sources were LLC and cross-core snoop hits in on-pkg core cache. 0x04: xsnp_hitm Retired load uops which data sources were HitM responses from shared LLC. 0x08: xsnp_none Retired load uops which data sources were hits in LLC without snoops required. |
| mem_load_uops_llc_miss_retired | Memory load uops with LLC (Last Level Cache) miss | 0, 1, 2, 3 |
0x01: local_dram Data from local DRAM either Snoop not needed or Snoop Miss (RspI)
|
| baclears | Frontend resteering | all |
0x1f: any Counts the total number when the front end is resteered, mainly when the BPU cannot provide a correct prediction and this is corrected by other branch handling mechanisms at the front end.
|
| l2_trans | L2 cache transactions | all |
0x01: demand_data_rd Demand Data Read requests that access L2 cache
0x02: rfo RFO requests that access L2 cache 0x04: code_rd L2 cache accesses when fetching instructions 0x08: all_pf L2 or LLC HW prefetches that access L2 cache 0x10: l1d_wb L1D writebacks that access L2 cache 0x20: l2_fill L2 fill requests that access L2 cache 0x40: l2_wb L2 writebacks that access L2 cache 0x80: all_requests Transactions accessing L2 pipe |
| l2_lines_in | L2 cache lines in | all |
0x01: i L2 cache lines in I state filling L2
0x02: s L2 cache lines in S state filling L2 0x04: e L2 cache lines in E state filling L2 0x07: all L2 cache lines filling L2 |
| l2_lines_out | L2 cache lines out | all |
0x01: demand_clean Clean L2 cache lines evicted by demand
0x02: demand_dirty Dirty L2 cache lines evicted by demand 0x04: pf_clean Clean L2 cache lines evicted by L2 prefetch 0x08: pf_dirty Dirty L2 cache lines evicted by L2 prefetch 0x0a: dirty_all Dirty L2 cache lines filling the L2 |
It is a capital mistake to theorise before one has data. Insensibly one begins to twist facts to suit theories instead of theories to suit facts.- Sherlock Holmes