Chapter 3. Controlling the profiler

Table of Contents

1. Using opcontrol
1.1. Examples
1.2. Specifying performance counter events
2. Setting up the JIT profiling feature
2.1. JVM instrumentation
3. Using oprof_start
4. Configuration details
4.1. Hardware performance counters
4.2. OProfile in RTC mode
4.3. OProfile in timer interrupt mode
4.4. Pentium 4 support
4.5. Intel Itanium 2 support
4.6. PowerPC64 support
4.7. Cell Broadband Engine support
4.8. AMD64 (x86_64) Instruction-Based Sampling (IBS) support
4.9. Dangerous counter settings

1. Using opcontrol

In this section we describe the configuration and control of the profiling system with opcontrol in more depth. The opcontrol script has a default setup, but you can alter this with the options given below. In particular, if your hardware supports performance counters, you can configure them. There are a number of counters (for example, counter 0 and counter 1 on the Pentium III). Each of these counters can be programmed with an event to count, such as cache misses or MMX operations. The event chosen for each counter is reflected in the profile data collected by OProfile: functions and binaries at the top of the profiles reflect that most of the chosen events happened within that code.

Additionally, each counter has a "count" value: this corresponds to how detailed the profile is. The lower the value, the more frequently profile samples are taken. A counter can choose to sample only kernel code, user-space code, or both (both is the default). Finally, some events have a "unit mask" - this is a value that further restricts the types of event that are counted. The event types and unit masks for your CPU are listed by opcontrol --list-events.

The opcontrol script provides the following actions :

--init

Loads the OProfile module if required and makes the OProfile driver interface available.

--setup

Followed by list arguments for profiling set up. List of arguments saved in /root/.oprofile/daemonrc. Giving this option is not necessary; you can just directly pass one of the setup options, e.g. opcontrol --no-vmlinux.

--status

Show configuration information.

--start-daemon

Start the oprofile daemon without starting actual profiling. The profiling can then be started using --start. This is useful for avoiding measuring the cost of daemon startup, as --start is a simple write to a file in oprofilefs. Not available in 2.2/2.4 kernels.

--start

Start data collection with either arguments provided by --setup or information saved in /root/.oprofile/daemonrc. Specifying the addition --verbose makes the daemon generate lots of debug data whilst it is running.

--dump

Force a flush of the collected profiling data to the daemon.

--stop

Stop data collection (this separate step is not possible with 2.2 or 2.4 kernels).

--shutdown

Stop data collection and kill the daemon.

--reset

Clears out data from current session, but leaves saved sessions.

--save=session_name

Save data from current session to session_name.

--deinit

Shuts down daemon. Unload the OProfile module and oprofilefs.

--list-events

List event types and unit masks.

--help

Generate usage messages.

There are a number of possible settings, of which, only --vmlinux (or --no-vmlinux) is required. These settings are stored in ~/.oprofile/daemonrc.

--buffer-size=num

Number of samples in kernel buffer. When using a 2.6 kernel buffer watershed need to be tweaked when changing this value.

--buffer-watershed=num

Set kernel buffer watershed to num samples (2.6 only). When it'll remain only buffer-size - buffer-watershed free entry in the kernel buffer data will be flushed to daemon, most usefull value are in the range [0.25 - 0.5] * buffer-size.

--cpu-buffer-size=num

Number of samples in kernel per-cpu buffer (2.6 only). If you profile at high rate it can help to increase this if the log file show excessive count of sample lost cpu buffer overflow.

--event=[eventspec]

Use the given performance counter event to profile. See Section 1.2, “Specifying performance counter events” below.

--session-dir=dir_path

Create/use sample database out of directory dir_path instead of the default location (/var/lib/oprofile).

--separate=[none,lib,kernel,thread,cpu,all]

By default, every profile is stored in a single file. Thus, for example, samples in the C library are all accredited to the /lib/libc.o profile. However, you choose to create separate sample files by specifying one of the below options.

none No profile separation (default)
lib Create per-application profiles for libraries
kernel Create per-application profiles for the kernel and kernel modules
thread Create profiles for each thread and each task
cpu Create profiles for each CPU
all All of the above options

Note that --separate=kernel also turns on --separate=lib. When using --separate=kernel, samples in hardware interrupts, soft-irqs, or other asynchronous kernel contexts are credited to the task currently running. This means you will see seemingly nonsense profiles such as /bin/bash showing samples for the PPP modules, etc.

On 2.2/2.4 only kernel threads already started when profiling begins are correctly profiled; newly started kernel thread samples are credited to the vmlinux (kernel) profile.

Using --separate=thread creates a lot of sample files if you leave OProfile running for a while; it's most useful when used for short sessions, or when using image filtering.

--callgraph=#depth

Enable call-graph sample collection with a maximum depth. Use 0 to disable callgraph profiling. NOTE: Callgraph support is available on a limited number of platforms at this time; for example:

  • x86 with recent 2.6 kernel

  • ARM with recent 2.6 kernel

  • PowerPC with 2.6.17 kernel

--image=image,[images]|"all"

Image filtering. If you specify one or more absolute paths to binaries, OProfile will only produce profile results for those binary images. This is useful for restricting the sometimes voluminous output you may get otherwise, especially with --separate=thread. Note that if you are using --separate=lib or --separate=kernel, then if you specification an application binary, the shared libraries and kernel code are included. Specify the value "all" to profile everything (the default).

--vmlinux=file

vmlinux kernel image.

--no-vmlinux

Use this when you don't have a kernel vmlinux file, and you don't want to profile the kernel. This still counts the total number of kernel samples, but can't give symbol-based results for the kernel or any modules.

1.1. Examples

1.1.1. Intel performance counter setup

Here, we have a Pentium III running at 800MHz, and we want to look at where data memory references are happening most, and also get results for CPU time.

# opcontrol --event=CPU_CLK_UNHALTED:400000 --event=DATA_MEM_REFS:10000
# opcontrol --vmlinux=/boot/2.6.0/vmlinux
# opcontrol --start

1.1.2. RTC mode

Here, we have an Intel laptop without support for performance counters, running on 2.4 kernels.

# ophelp -r
CPU with RTC device
# opcontrol --vmlinux=/boot/2.4.13/vmlinux --event=RTC_INTERRUPTS:1024
# opcontrol --start

1.1.3. Starting the daemon separately

If we're running 2.6 kernels, we can use --start-daemon to avoid the profiler startup affecting results.

# opcontrol --vmlinux=/boot/2.6.0/vmlinux
# opcontrol --start-daemon
# my_favourite_benchmark --init
# opcontrol --start ; my_favourite_benchmark --run ; opcontrol --stop

1.1.4. Separate profiles for libraries and the kernel

Here, we want to see a profile of the OProfile daemon itself, including when it was running inside the kernel driver, and its use of shared libraries.

# opcontrol --separate=kernel --vmlinux=/boot/2.6.0/vmlinux
# opcontrol --start
# my_favourite_stress_test --run
# opreport -l -p /lib/modules/2.6.0/kernel /usr/local/bin/oprofiled

1.1.5. Profiling sessions

It can often be useful to split up profiling data into several different time periods. For example, you may want to collect data on an application's startup separately from the normal runtime data. You can use the simple command opcontrol --save to do this. For example :

# opcontrol --save=blah

will create a sub-directory in $SESSION_DIR/samples containing the samples up to that point (the current session's sample files are moved into this directory). You can then pass this session name as a parameter to the post-profiling analysis tools, to only get data up to the point you named the session. If you do not want to save a session, you can do rm -rf $SESSION_DIR/samples/sessionname or, for the current session, opcontrol --reset.

1.2. Specifying performance counter events

The --event option to opcontrol takes a specification that indicates how the details of each hardware performance counter should be setup. If you want to revert to OProfile's default setting (--event is strictly optional), use --event=default. Use of this option over-rides all previous event selections.

You can pass multiple event specifications. OProfile will allocate hardware counters as necessary. Note that some combinations are not allowed by the CPU; running opcontrol --list-events gives the details of each event. The event specification is a colon-separated string of the form name:count:unitmask:kernel:user as described in this table:

name The symbolic event name, e.g. CPU_CLK_UNHALTED
count The counter reset value, e.g. 100000
unitmask The unit mask, as given in the events list: e.g. 0x0f; or a symbolic name as given by the first word of the description (only valid for unit masks having an "extra:" parameter)
kernel Whether to profile kernel code
user Whether to profile userspace code

The last three values are optional, if you omit them (e.g. --event=DATA_MEM_REFS:30000), they will be set to the default values (a unit mask of 0, and profiling both kernel and userspace code). Note that some events require a unit mask.

Note

For the PowerPC platforms, all events specified must be in the same group; i.e., the group number appended to the event name (e.g. <some-event-name>_GRP9) must be the same.

If OProfile is using RTC mode, and you want to alter the default counter value, you can use something like --event=RTC_INTERRUPTS:2048. Note the last three values here are ignored. If OProfile is using timer-interrupt mode, there is no configuration possible.

The table below lists the events selected by default (--event=default) for the various computer architectures:

Processor cpu_type Default event
Alpha EV4 alpha/ev4 CYCLES:100000:0:1:1
Alpha EV5 alpha/ev5 CYCLES:100000:0:1:1
Alpha PCA56 alpha/pca56 CYCLES:100000:0:1:1
Alpha EV6 alpha/ev6 CYCLES:100000:0:1:1
Alpha EV67 alpha/ev67 CYCLES:100000:0:1:1
ARM/XScale PMU1 arm/xscale1 CPU_CYCLES:100000:0:1:1
ARM/XScale PMU2 arm/xscale2 CPU_CYCLES:100000:0:1:1
ARM/MPCore arm/mpcore CPU_CYCLES:100000:0:1:1
AVR32 avr32 CPU_CYCLES:100000:0:1:1
Athlon i386/athlon CPU_CLK_UNHALTED:100000:0:1:1
Pentium Pro i386/ppro CPU_CLK_UNHALTED:100000:0:1:1
Pentium II i386/pii CPU_CLK_UNHALTED:100000:0:1:1
Pentium III i386/piii CPU_CLK_UNHALTED:100000:0:1:1
Pentium M (P6 core) i386/p6_mobile CPU_CLK_UNHALTED:100000:0:1:1
Pentium 4 (non-HT) i386/p4 GLOBAL_POWER_EVENTS:100000:1:1:1
Pentium 4 (HT) i386/p4-ht GLOBAL_POWER_EVENTS:100000:1:1:1
Hammer x86-64/hammer CPU_CLK_UNHALTED:100000:0:1:1
Family10h x86-64/family10 CPU_CLK_UNHALTED:100000:0:1:1
Family11h x86-64/family11h CPU_CLK_UNHALTED:100000:0:1:1
Itanium ia64/itanium CPU_CYCLES:100000:0:1:1
Itanium 2 ia64/itanium2 CPU_CYCLES:100000:0:1:1
TIMER_INT timer None selectable
IBM iseries PowerPC 4/5/970 CYCLES:10000:0:1:1
IBM pseries PowerPC 4/5/970/Cell CYCLES:10000:0:1:1
IBM s390 timer None selectable
IBM s390x timer None selectable