This is a list of all Intel Sandy Bridge Microarchitecture performance counter event types. Please see Intel Architecture Developer's Manual Volume 3B, Appendix A and Intel Architecture Optimization Reference Manual (730795-001)
| Name | Description | Counters usable | Unit mask options |
| CPU_CLK_UNHALTED | Clock cycles when not halted | all | |
| UNHALTED_REFERENCE_CYCLES | Unhalted reference cycles | all |
0x01: No unit mask
|
| INST_RETIRED | number of instructions retired | all | |
| LLC_MISSES | Last level cache demand requests from this core that missed the LLC | all |
0x41: No unit mask
|
| LLC_REFS | Last level cache demand requests from this core | all |
0x4f: No unit mask
|
| BR_INST_RETIRED | number of branch instructions retired | all | |
| BR_MISS_PRED_RETIRED | number of mispredicted branches retired (precise) | all | |
| ld_blocks | blocked loads | all |
0x01: data_unknown blocked loads due to store buffer blocks with unknown data.
0x02: store_forward loads blocked by overlapping with store buffer that cannot be forwarded 0x08: no_sr This event counts the number of times that split load operations are temporarily blocked because all resources for handling the split accesses are in use. 0x10: all_block Number of cases where any load is blocked but has no DCU miss. |
| misalign_mem_ref | Misaligned memory references | all |
0x01: loads Speculative cache-line split load uops dispatched to the L1D.
0x02: stores Speculative cache-line split Store-address uops dispatched to L1D |
| ld_blocks_partial | Partial loads | all |
0x01: address_alias False dependencies in MOB due to partial compare on address
0x08: all_sta_block This event counts the number of times that load operations are temporarily blocked because of older stores, with addresses that are not yet known. A load operation may incur more than one block of this type. |
| dtlb_load_misses | D-TLB misses | all |
0x01: miss_causes_a_walk Miss in all TLB levels causes an page walk of any page size (4K/2M/4M/1G)
0x02: walk_completed Miss in all TLB levels causes a page walk that completes of any page size (4K/2M/4M/1G) 0x04: walk_duration Cycles PMH is busy with this walk 0x10: stlb_hit First level miss but second level hit; no page walk. |
| int_misc | Instruction decoder events | all |
0x40: rat_stall_cycles Cycles Resource Allocation Table (RAT) external stall is sent to Instruction Decode Queue (IDQ) for this thread.
0x03: recovery_cycles Number of cycles waiting to be recover after Nuke due to all other cases except JEClear. (extra: cmask=1) 0x03: recovery_stalls_count Edge applied to recovery_cycles, thus counts occurrences. (extra: edge cmask=1) |
| uops_issued | Number of Uops issued | 0, 1, 2, 3 |
0x01: any Number of Uops issued by the Resource Allocation Table (RAT) to the Reservation Station (RS)
0x01: stall_cycles cycles no uops issued by this thread. (extra: inv cmask=1) |
| arith | Misc ALU events | all |
0x01: fpu_div_active Cycles that the divider is busy with any divide or sqrt operation.
0x01: fpu_div Number of times that the divider is actived, includes INT, SIMD and FP. (extra: edge cmask=1) |
| insts_written_to_iq | Number of instructions written to Instruction Queue (IQ) this cycle. | all |
0x01: No unit mask
|
| l2_rqsts | Requests from L2 cache | all |
0x01: demand_data_rd_hit Demand Data Read hit L2, no rejects
0x04: rfo_hit RFO requests that hit L2 cache 0x08: rfo_miss RFO requests that miss L2 cache 0x10: code_rd_hit L2 cache hits when fetching instructions, code reads. 0x20: code_rd_miss L2 cache misses when fetching instructions 0x40: pf_hit Requests from the L2 hardware prefetchers that hit L2 cache 0x80: pf_miss Requests from the L2 hardware prefetchers that miss L2 cache 0x03: all_demand_data_rd Any data read request to L2 cache 0x0c: all_rfo Any data RFO request to L2 cache 0x30: all_code_rd Any code read request to L2 cache 0xc0: all_pf Any L2 HW prefetch request to L2 cache |
| l2_store_lock_rqsts | L2 cache store lock requests | all |
0x0f: all RFOs that access cache lines in any state
0x01: miss RFO (as a result of regular RFO or Lock request) miss cache - I state 0x04: hit_e RFO (as a result of regular RFO or Lock request) hits cache in E state 0x08: hit_m RFO (as a result of regular RFO or Lock request) hits cache in M state |
| l2_l1d_wb_rqsts | writebacks from L1D to the L2 cache | all |
0x04: hit_e writebacks from L1D to L2 cache lines in E state
0x08: hit_m writebacks from L1D to L2 cache lines in M state |
| l1d_pend_miss | Cycles with L1D load Misses outstanding. | 2 |
0x01: pending Cycles with L1D load Misses outstanding.
0x01: occurences This event counts the number of L1D misses outstanding occurences. (extra: edge cmask=1) |
| dtlb_store_misses | D-TLB store misses | all |
0x01: miss_causes_a_walk Miss in all TLB levels causes an page walk of any page size (4K/2M/4M/1G)
0x02: walk_completed Miss in all TLB levels causes a page walk that completes of any page size (4K/2M/4M/1G) 0x04: walk_duration Cycles PMH is busy with this walk 0x10: stlb_hit First level miss but second level hit; no page walk. Only relevant if multiple levels. |
| load_hit_pre | Load dispatches that hit fill buffer | all |
0x01: sw_pf Load dispatches that hit fill buffer allocated for S/W prefetch.
0x02: hw_pf Load dispatches that hit fill buffer allocated for HW prefetch. |
| hw_pre_req | Hardware Prefetch requests | all |
0x02: No unit mask
|
| l1d | L1D cache events | all |
0x01: replacement L1D Data line replacements.
0x02: allocated_in_m L1D M-state Data Cache Lines Allocated 0x04: eviction L1D M-state Data Cache Lines Evicted due to replacement (only) 0x08: all_m_replacement All Modified lines evicted out of L1D |
| partial_rat_stalls | Partial RAT stalls | all |
0x20: flags_merge_uop Number of perf sensitive flags-merge uops added by Sandy Bridge u-arch.
0x40: slow_lea_window Number of cycles with at least 1 slow Load Effective Address (LEA) uop being allocated. 0x80: mul_single_uop Number of Multiply packed/scalar single precision uops allocated 0x20: flags_merge_uop_cycles Cycles with perf sensitive flags-merge uops added by SandyBridge u-arch. (extra: cmask=1) |
| resource_stalls2 | Misc resource stalls | 0, 1, 2, 3 |
0x40: bob_full Cycles Allocator is stalled due Branch Order Buffer (BOB).
0x0f: all_prf_control Resource stalls2 control structures full for physical registers 0x0c: all_fl_empty Cycles with either free list is empty 0x4f: ooo_rsrc Resource stalls2 control structures full Physical Register Reclaim Table (PRRT), Physical History Table (PHT), INT or SIMD Free List (FL), Branch Order Buffer (BOB) |
| cpl_cycles | Unhalted core cycles in specific rings | all |
0x01: ring0 Unhalted core cycles the Thread was in Rings 0.
0x01: ring0_trans Transitions from ring123 to Ring0. (extra: edge cmask=1) 0x02: ring123 Unhalted core cycles the Thread was in Rings 1/2/3. |
| rs_events | Events for the reservation station | 0, 1, 2, 3 |
0x01: No unit mask
|
| offcore_requests_outstanding | Offcore outstanding transactions | all |
0x01: demand_data_rd Offcore outstanding Demand Data Read transactions in the SuperQueue (SQ), queue to uncore, every cycle. Includes L1D data hardware prefetches.
0x01: cycles_with_demand_data_rd cycles there are Offcore outstanding RD data transactions in the SuperQueue (SQ), queue to uncore. (extra: cmask=1) 0x02: demand_code_rd Offcore outstanding Code Reads transactions in the SuperQueue (SQ), queue to uncore, every cycle. 0x04: demand_rfo Offcore outstanding RFO (store) transactions in the SuperQueue (SQ), queue to uncore, every cycle. 0x08: all_data_rd Offcore outstanding all cacheable Core Data Read transactions in the SuperQueue (SQ), queue to uncore, every cycle. 0x08: cycles_with_data_rd Cycles there are Offcore outstanding all Data read transactions in the SuperQueue (SQ), queue to uncore, every cycle. (extra: cmask=1) 0x02: cycles_with_demand_code_rd Cycles with offcore outstanding Code Reads transactions in the SuperQueue (SQ), queue to uncore, every cycle. (extra: cmask=1) 0x04: cycles_with_demand_rfo Cycles with offcore outstanding demand RFO Reads transactions in the SuperQueue (SQ), queue to uncore, every cycle. (extra: cmask=1) |
| lock_cycles | Cycles due to LOCK prefixes. | all |
0x01: split_lock_uc_lock_duration Cycles in which the L1D and L2 are locked, due to a UC lock or split lock
0x02: cache_lock_duration cycles that theL1D is locked |
| idq | Instruction Decode Queue events | 0, 1, 2, 3 |
0x02: empty Cycles the Instruction Decode Queue (IDQ) is empty.
0x04: mite_uops Number of uops delivered to Instruction Decode Queue (IDQ) from MITE path. 0x08: dsb_uops Number of uops delivered to Instruction Decode Queue (IDQ) from Decode Stream Buffer (DSB) path. 0x10: ms_dsb_uops Number of Uops delivered into Instruction Decode Queue (IDQ) when MS_Busy, initiated by Decode Stream Buffer (DSB). 0x20: ms_mite_uops Number of Uops delivered into Instruction Decode Queue (IDQ) when MS_Busy, initiated by MITE. 0x30: ms_uops Number of Uops were delivered into Instruction Decode Queue (IDQ) from MS, initiated by Decode Stream Buffer (DSB) or MITE. 0x30: ms_cycles Number of cycles that Uops were delivered into Instruction Decode Queue (IDQ) when MS_Busy, initiated by Decode Stream Buffer (DSB) or MITE. (extra: cmask=1) 0x04: mite_cycles Cycles MITE is active (extra: cmask=1) 0x08: dsb_cycles Cycles Decode Stream Buffer (DSB) is active (extra: cmask=1) 0x10: ms_dsb_cycles Cycles Decode Stream Buffer (DSB) Microcode Sequenser (MS) is active (extra: cmask=1) 0x10: ms_dsb_occur Occurences of Decode Stream Buffer (DSB) Microcode Sequenser (MS) going active (extra: edge cmask=1) 0x18: all_dsb_cycles_any_uops Cycles Decode Stream Buffer (DSB) is delivering anything (extra: cmask=1) 0x18: all_dsb_cycles_4_uops Cycles Decode Stream Buffer (DSB) is delivering 4 Uops (extra: cmask=4) 0x24: all_mite_cycles_any_uops Cycles MITE is delivering anything (extra: cmask=1) 0x24: all_mite_cycles_4_uops Cycles MITE is delivering 4 Uops (extra: cmask=4) 0x3c: mite_all_uops Number of uops delivered to Instruction Decode Queue (IDQ) from any path. |
| icache | Instruction cache events | all |
0x02: No unit mask
|
| itlb_misses | I-TLB misses | all |
0x01: miss_causes_a_walk Miss in all TLB levels causes an page walk of any page size (4K/2M/4M)
0x02: walk_completed Miss in all TLB levels causes a page walk that completes of any page size (4K/2M/4M) 0x04: walk_duration Cycles PMH is busy with this walk. 0x10: stlb_hit First level miss but second level hit; no page walk. |
| ild_stall | Instruction decoding stalls | all |
0x01: lcp Stall "occurrences" due to length changing prefixes (LCP).
0x04: iq_full Stall cycles when instructions cannot be written because the Instruction Queue (IQ) is full. |
| br_inst_exec | Branch instructions | all |
0xff: all_branches All branch instructions executed.
0x41: nontaken_conditional All macro conditional nontaken branch instructions. 0x81: taken_conditional All macro conditional taken branch instructions. 0x82: taken_direct_jump All macro unconditional taken branch instructions, excluding calls and indirects. 0x84: taken_indirect_jump_non_call_ret All taken indirect branches that are not calls nor returns. 0x88: taken_indirect_near_return All taken indirect branches that have a return mnemonic. 0x90: taken_direct_near_call All taken non-indirect calls. 0xa0: taken_indirect_near_call All taken indirect calls, including both register and memory indirect. 0xc1: all_conditional All macro conditional branch instructions. 0xc2: all_direct_jmp All macro unconditional branch instructions, excluding calls and indirects 0xc4: all_indirect_jump_non_call_ret All indirect branches that are not calls nor returns. 0xc8: all_indirect_near_return All indirect return branches. 0xd0: all_direct_near_call All non-indirect calls executed. |
| br_misp_exec | Mispredicted branch instructions | all |
0xff: all_branches All mispredicted branch instructions executed.
0x41: nontaken_conditional All nontaken mispredicted macro conditional branch instructions. 0x81: taken_conditional All taken mispredicted macro conditional branch instructions. 0x84: taken_indirect_jump_non_call_ret All taken mispredicted indirect branches that are not calls nor returns. 0x88: taken_return_near All taken mispredicted indirect branches that have a return mnemonic. 0x90: taken_direct_near_call All taken mispredicted non-indirect calls. 0xa0: taken_indirect_near_call All taken mispredicted indirect calls, including both register and memory indirect. 0xc1: all_conditional All mispredicted macro conditional branch instructions. 0xc4: all_indirect_jump_non_call_ret All mispredicted indirect branches that are not calls nor returns. 0xd0: all_direct_near_call All mispredicted non-indirect calls |
| idq_uops_not_delivered | uops not delivered to IDQ. | 0, 1, 2, 3 |
0x01: core Count number of non-delivered uops to Resource Allocation Table (RAT).
0x01: cycles_0_uops_deliv.core Counts the cycles no uops were delivered (extra: cmask=4) 0x01: cycles_le_1_uop_deliv.core Counts the cycles less than 1 uops were delivered (extra: cmask=3) 0x01: cycles_le_2_uop_deliv.core Counts the cycles less than 2 uops were delivered (extra: cmask=2) 0x01: cycles_le_3_uop_deliv.core Counts the cycles less than 3 uops were delivered (extra: cmask=1) 0x01: cycles_ge_1_uop_deliv.core Cycles when 1 or more uops were delivered to the by the front end. (extra: inv cmask=4) 0x01: cycles_fe_was_ok Counts cycles FE delivered 4 uops or Resource Allocation Table (RAT) was stalling FE. (extra: inv cmask=1) |
| uops_dispatched_port | Count on which ports uops are dispatched. | all |
0x01: port_0 Cycles which a Uop is dispatched on port 0
0x02: port_1 Cycles which a Uop is dispatched on port 1 0x04: port_2_ld Cycles which a load Uop is dispatched on port 2 0x08: port_2_sta Cycles which a STA Uop is dispatched on port 2 0x10: port_3_ld Cycles which a load Uop is dispatched on port 3 0x20: port_3_sta Cycles which a STA Uop is dispatched on port 3 0x40: port_4 Cycles which a Uop is dispatched on port 4 0x80: port_5 Cycles which a Uop is dispatched on port 5 0x0c: port_2 Uops disptached to port 2, loads and stores (speculative and retired) 0x30: port_3 Uops disptached to port 3, loads and stores (speculative and retired) 0x0c: port_2_core Uops disptached to port 2, loads and stores per core (speculative and retired) 0x30: port_3_core Uops disptached to port 3, loads and stores per core (speculative and retired) |
| resource_stalls | Core resource stalls | all |
0x01: any Cycles Allocation is stalled due to Resource Related reason.
0x02: lb Cycles Allocator is stalled due to Load Buffer full 0x04: rs Stall due to no eligible Reservation Station (RS) entry available. 0x08: sb Cycles Allocator is stalled due to Store Buffer full (not including draining from synch). 0x10: rob ROB full cycles. 0x0e: mem_rs Resource stalls due to LB, SB or Reservation Station (RS) being completely in use 0xf0: ooo_rsrc Resource stalls due to Rob being full, FCSW, MXCSR and OTHER 0x0a: lb_sb Resource stalls due to load or store buffers |
| dsb2mite_switches | Number of Decode Stream Buffer (DSB) to MITE switches | all |
0x01: count Number of Decode Stream Buffer (DSB) to MITE switches
0x02: penalty_cycles Decode Stream Buffer (DSB)-to-MITE switch true penalty cycles. |
| dsb_fill | DSB fill events | all |
0x02: other_cancel Count number of times a valid DSB fill has been actually cancelled for any reason.
0x08: exceed_dsb_lines Decode Stream Buffer (DSB) Fill encountered > 3 Decode Stream Buffer (DSB) lines. 0x0a: all_cancel Count number of times a valid Decode Stream Buffer (DSB) fill has been actually cancelled for any reason. |
| itlb | ITLB events | all |
0x01: No unit mask
|
| offcore_requests | Requests sent outside the core | all |
0x01: demand_data_rd Demand Data Read requests sent to uncore
0x02: demand_code_rd Offcore Code read requests. Includes Cacheable and Un-cacheables. 0x04: demand_rfo Offcore Demand RFOs. Includes regular RFO, Locks, ItoM. 0x08: all_data_rd Offcore Demand and prefetch data reads returned to the core. |
| uops_dispatched | uops dispatched | 0, 1, 2, 3 |
0x01: thread Counts total number of uops to be dispatched per-thread each cycle.
0x01: stall_cycles Counts number of cycles no uops were dispatced to be executed on this thread. (extra: inv cmask=1) 0x02: core Counts total number of uops dispatched from any thread |
| offcore_requests_buffer | Offcore requests buffer events | all |
0x01: No unit mask
|
| agu_bypass_cancel | AGU bypass cancel | all |
0x01: No unit mask
|
| tlb_flush | TLB flushes | all |
0x01: dtlb_thread Count number of DTLB flushes of thread-specific entries.
0x20: stlb_any Count number of any STLB flushes |
| l1d_blocks | L1D cache blocking events | all |
0x01: ld_bank_conflict Any dispatched loads cancelled due to DCU bank conflict
0x05: bank_conflict_cycles Cycles with l1d blocks due to bank conflicts (extra: cmask=1) |
| inst_retired | Instructions retired | 1 |
0x01: No unit mask
|
| other_assists | Instructions that needed an assist | all |
0x02: itlb_miss_retired Instructions that experienced an ITLB miss. Non Pebs
0x10: avx_to_sse Number of transitions from AVX-256 to legacy SSE when penalty applicable Non Pebs 0x20: sse_to_avx Number of transitions from legacy SSE to AVX-256 when penalty applicable Non Pebs |
| uops_retired | uops that actually retired. | 0, 1, 2, 3 |
0x01: all All uops that actually retired.
0x02: retire_slots number of retirement slots used non PEBS 0x01: stall_cycles Cycles no executable uops retired (extra: inv cmask=1) 0x01: total_cycles Number of cycles using always true condition applied to non PEBS uops retired event. (extra: inv cmask=10) |
| machine_clears | Number of Machine Clears detected. | all |
0x02: memory_ordering Number of Memory Ordering Machine Clears detected.
0x04: smc Number of Self-modifying code (SMC) Machine Clears detected. 0x20: maskmov Number of AVX masked mov Machine Clears detected. |
| br_inst_retired | Counts branch instructions retired | 0, 1, 2, 3 |
0x01: conditional Counts all taken and not taken macro conditional branch instructions.
0x02: near_call Counts all macro direct and indirect near calls. non PEBS 0x08: near_return This event counts the number of near ret instructions retired. 0x10: not_taken Counts all not taken macro branch instructions retired. 0x20: near_taken Counts the number of near branch taken instructions retired. 0x40: far_branch Counts the number of far branch instructions retired. 0x04: all_branches_ps Counts all taken and not taken macro branches including far branches.(Precise Event) 0x02: near_call_r3 Ring123 only near calls (non precise) 0x02: near_call_r3_ps Ring123 only near calls (precise event) |
| br_misp_retired | Counts mispredicted branch instructions | 0, 1, 2, 3 |
0x01: conditional All mispredicted macro conditional branch instructions.
0x02: near_call All macro direct and indirect near calls 0x10: not_taken number of branch instructions retired that were mispredicted and not-taken. 0x20: taken number of branch instructions retired that were mispredicted and taken. 0x04: all_branches_ps all macro branches (Precise Event) |
| fp_assist | Counts floating point assists | 0, 1, 2, 3 |
0x1e: any Counts any FP_ASSIST umask was incrementing. (extra: cmask=1)
0x02: x87_output output - Numeric Overflow, Numeric Underflow, Inexact Result 0x04: x87_input input - Invalid Operation, Denormal Operand, SNaN Operand 0x08: simd_output Any output SSE* FP Assist - Numeric Overflow, Numeric Underflow. 0x10: simd_input Any input SSE* FP Assist |
| hw_interrupts | Number of hardware interrupts received by the processor. | all |
0x01: No unit mask
|
| rob_misc_events | Count ROB (Register Reorder Buffer) events. | all |
0x20: No unit mask
|
| mem_trans_retired | Count memory transactions | 3 |
0x02: No unit mask
|
| mem_uops_retired | Count uops with memory accessed retired | 0, 1, 2, 3 |
0x11: stlb_miss_loads STLB misses dues to retired loads
0x12: stlb_miss_stores STLB misses dues to retired stores 0x21: lock_loads Locked retired loads 0x41: split_loads Retired loads causing cacheline splits 0x42: split_stores Retired stores causing cacheline splits 0x81: all_loads Any retired loads 0x82: all_stores Any retired stores |
| mem_load_uops_retired | Memory load uops. | 0, 1, 2, 3 |
0x01: l1_hit Load hit in nearest-level (L1D) cache
0x02: l2_hit Load hit in mid-level (L2) cache 0x04: llc_hit Load hit in last-level (L3) cache with no snoop needed 0x40: hit_lfb A load missed L1D but hit the Fill Buffer |
| mem_load_uops_llc_hit_retired | Memory load uops with LLC (Last level cache) hit | 0, 1, 2, 3 |
0x01: xsnp_miss Load LLC Hit and a cross-core Snoop missed in on-pkg core cache
0x02: xsnp_hit Load LLC Hit and a cross-core Snoop hits in on-pkg core cache 0x04: xsnp_hitm Load had HitM Response from a core on same socket (shared LLC). 0x08: xsnp_none Load hit in last-level (L3) cache with no snoop needed. |
| mem_load_uops_misc_retired | Memory load uops retired | 0, 1, 2, 3 |
0x02: No unit mask
|
| l2_trans | L2 cache accesses | all |
0x80: all_requests Transactions accessing L2 pipe
0x01: demand_data_rd Demand Data Read requests that access L2 cache, includes L1D prefetches. 0x02: rfo RFO requests that access L2 cache 0x04: code_rd L2 cache accesses when fetching instructions including L1D code prefetches 0x08: all_pf L2 or LLC HW prefetches that access L2 cache 0x10: l1d_wb L1D writebacks that access L2 cache 0x20: l2_fill L2 fill requests that access L2 cache 0x40: l2_wb L2 writebacks that access L2 cache |
| l2_lines_in | L2 cache lines in | all |
0x07: all L2 cache lines filling L2
0x01: i L2 cache lines in I state filling L2 0x02: s L2 cache lines in S state filling L2 0x04: e L2 cache lines in E state filling L2 |
| l2_lines_out | L2 cache lines out | all |
0x01: demand_clean Clean line evicted by a demand
0x02: demand_dirty Dirty line evicted by a demand 0x04: pf_clean Clean line evicted by an L2 Prefetch 0x08: pf_dirty Dirty line evicted by an L2 Prefetch 0x0a: dirty_all Any Dirty line evicted |
| sq_misc | Store queue misc events | all |
0x10: No unit mask
|
Premature optimization is the root of all evil.- Tony Hoare