This is a list of all ARM V7's performance counter event types. Please see ARM11 Technical Reference Manual.
| Name | Description | Counters usable | Unit mask options |
| PMNC_SW_INCR | Software increment of PMNC registers | 1, 2, 3, 4, 5, 6 | |
| IFETCH_MISS | Instruction fetch misses from cache or normal cacheable memory | 1, 2, 3, 4, 5, 6 | |
| ITLB_MISS | Instruction fetch misses from TLB | 1, 2, 3, 4, 5, 6 | |
| DCACHE_REFILL | Data R/W operation that causes a refill from cache or normal cacheable memory | 1, 2, 3, 4, 5, 6 | |
| DCACHE_ACCESS | Data R/W from cache | 1, 2, 3, 4, 5, 6 | |
| DTLB_REFILL | Data R/W that causes a TLB refill | 1, 2, 3, 4, 5, 6 | |
| DREAD | Data read architecturally executed (note: architecturally executed = for instructions that are unconditional or that pass the condition code) | 1, 2, 3, 4, 5, 6 | |
| DWRITE | Data write architecturally executed | 1, 2, 3, 4, 5, 6 | |
| INSTR_EXECUTED | All executed instructions | 1, 2, 3, 4, 5, 6 | |
| EXC_TAKEN | Exception taken | 1, 2, 3, 4, 5, 6 | |
| EXC_EXECUTED | Exception return architecturally executed | 1, 2, 3, 4, 5, 6 | |
| CID_WRITE | Instruction that writes to the Context ID Register architecturally executed | 1, 2, 3, 4, 5, 6 | |
| PC_WRITE | SW change of PC, architecturally executed (not by exceptions) | 1, 2, 3, 4, 5, 6 | |
| PC_IMM_BRANCH | Immediate branch instruction executed (taken or not) | 1, 2, 3, 4, 5, 6 | |
| PC_PROC_RETURN | Procedure return architecturally executed (not by exceptions) | 1, 2, 3, 4, 5, 6 | |
| UNALIGNED_ACCESS | Unaligned access architecturally executed | 1, 2, 3, 4, 5, 6 | |
| PC_BRANCH_MIS_PRED | Branch mispredicted or not predicted. Counts pipeline flushes because of misprediction | 1, 2, 3, 4, 5, 6 | |
| PC_BRANCH_MIS_USED | Branch or change in program flow that could have been predicted | 1, 2, 3, 4, 5, 6 | |
| CPU_CYCLES | Number of CPU cycles | 0 | |
| WRITE_BUFFER_FULL | Any write buffer full cycle | 1, 2, 3, 4 | |
| L2_STORE_MERGED | Any store that is merged in L2 cache | 1, 2, 3, 4 | |
| L2_STORE_BUFF | Any bufferable store from load/store to L2 cache | 1, 2, 3, 4 | |
| L2_ACCESS | Any access to L2 cache | 1, 2, 3, 4 | |
| L2_CACH_MISS | Any cacheable miss in L2 cache | 1, 2, 3, 4 | |
| AXI_READ_CYCLES | Number of cycles for an active AXI read | 1, 2, 3, 4 | |
| AXI_WRITE_CYCLES | Number of cycles for an active AXI write | 1, 2, 3, 4 | |
| MEMORY_REPLAY | Any replay event in the memory subsystem | 1, 2, 3, 4 | |
| UNALIGNED_ACCESS_REPLAY | Unaligned access that causes a replay | 1, 2, 3, 4 | |
| L1_DATA_MISS | L1 data cache miss as a result of the hashing algorithm | 1, 2, 3, 4 | |
| L1_INST_MISS | L1 instruction cache miss as a result of the hashing algorithm | 1, 2, 3, 4 | |
| L1_DATA_COLORING | L1 data access in which a page coloring alias occurs | 1, 2, 3, 4 | |
| L1_NEON_DATA | NEON data access that hits L1 cache | 1, 2, 3, 4 | |
| L1_NEON_CACH_DATA | NEON cacheable data access that hits L1 cache | 1, 2, 3, 4 | |
| L2_NEON | L2 access as a result of NEON memory access | 1, 2, 3, 4 | |
| L2_NEON_HIT | Any NEON hit in L2 cache | 1, 2, 3, 4 | |
| L1_INST | Any L1 instruction cache access, excluding CP15 cache accesses | 1, 2, 3, 4 | |
| PC_RETURN_MIS_PRED | Return stack misprediction at return stack pop (incorrect target address) | 1, 2, 3, 4 | |
| PC_BRANCH_FAILED | Branch prediction misprediction | 1, 2, 3, 4 | |
| PC_BRANCH_TAKEN | Any predicted branch that is taken | 1, 2, 3, 4 | |
| PC_BRANCH_EXECUTED | Any taken branch that is executed | 1, 2, 3, 4 | |
| OP_EXECUTED | Number of operations executed (in instruction or mutli-cycle instruction) | 1, 2, 3, 4 | |
| CYCLES_INST_STALL | Cycles where no instruction available | 1, 2, 3, 4 | |
| CYCLES_INST | Number of instructions issued in a cycle | 1, 2, 3, 4 | |
| CYCLES_NEON_DATA_STALL | Number of cycles the processor waits on MRC data from NEON | 1, 2, 3, 4 | |
| CYCLES_NEON_INST_STALL | Number of cycles the processor waits on NEON instruction queue or NEON load queue | 1, 2, 3, 4 | |
| NEON_CYCLES | Number of cycles NEON and integer processors are not idle | 1, 2, 3, 4 | |
| PMU0_EVENTS | Number of events from external input source PMUEXTIN[0] | 1, 2, 3, 4 | |
| PMU1_EVENTS | Number of events from external input source PMUEXTIN[1] | 1, 2, 3, 4 | |
| PMU_EVENTS | Number of events from both external input sources PMUEXTIN[0] and PMUEXTIN[1] | 1, 2, 3, 4 |
Bottlenecks occur in surprising places, so don't try to second guess and put in a speed hack until you've proven that's where the bottleneck is.- Rob Pike