The opreport utility is the primary utility you will use for getting formatted data out of OProfile. It produces two types of data: image summaries and symbol summaries. An image summary lists the number of samples for individual binary images such as libraries or applications. Symbol summaries provide per-symbol profile data. In the following example, we're getting an image summary for the whole system:
$ opreport --long-filenames
CPU: PIII, speed 863.195 MHz (estimated)
Counted CPU_CLK_UNHALTED events (clocks processor is not halted) with a unit mask of 0x00 (No unit mask) count 23150
905898 59.7415 /usr/lib/gcc-lib/i386-redhat-linux/3.2/cc1plus
214320 14.1338 /boot/2.6.0/vmlinux
103450 6.8222 /lib/i686/libc-2.3.2.so
60160 3.9674 /usr/local/bin/madplay
31769 2.0951 /usr/local/oprofile-pp/bin/oprofiled
26550 1.7509 /usr/lib/libartsflow.so.1.0.0
23906 1.5765 /usr/bin/as
18770 1.2378 /oprofile
15528 1.0240 /usr/lib/qt-3.0.5/lib/libqt-mt.so.3.0.5
11979 0.7900 /usr/X11R6/bin/XFree86
11328 0.7471 /bin/bash
...
|
If we had specified --symbols in the previous command, we would have
gotten a symbol summary of all the images across the entire system. We can restrict this to only
part of the system profile; for example,
below is a symbol summary of the OProfile daemon. Note that as we used
opcontrol --separate=kernel, symbols from images that oprofiled
has used are also shown.
$ opreport -l `which oprofiled` 2>/dev/null | more CPU: PIII, speed 863.195 MHz (estimated) Counted CPU_CLK_UNHALTED events (clocks processor is not halted) with a unit mask of 0x00 (No unit mask) count 23150 vma samples % image name symbol name 0804be10 14971 28.1993 oprofiled odb_insert 0804afdc 7144 13.4564 oprofiled pop_buffer_value c01daea0 6113 11.5144 vmlinux __copy_to_user_ll 0804b060 2816 5.3042 oprofiled opd_put_sample 0804b4a0 2147 4.0441 oprofiled opd_process_samples 0804acf4 1855 3.4941 oprofiled opd_put_image_sample 0804ad84 1766 3.3264 oprofiled opd_find_image 0804a5ec 1084 2.0418 oprofiled opd_find_module 0804ba5c 741 1.3957 oprofiled odb_hash_add_node ... |
These are the two basic ways you are most likely to use regularly, but opreport can do a lot more than that, as described below.
--separate= options
whilst profiling, there can be several separate profiles for
a single binary image within a session. Normally the output
will keep these images separated (so, for example, the image summary
output shows library image summaries on a per-application basis,
when using --separate=lib).
Sometimes it can be useful to merge these results back together
before getting results. The --merge option allows
you to do that.
event: profile specifications, etc.
This section provides details on how to use the OProfile callgraph feature.
When using the opcontrol --callgraph option, you can see what
functions are calling other functions in the output. Consider the
following program:
#include <string.h>
#include <stdlib.h>
#include <stdio.h>
#define SIZE 500000
static int compare(const void *s1, const void *s2)
{
return strcmp(s1, s2);
}
static void repeat(void)
{
int i;
char *strings[SIZE];
char str[] = "abcdefghijklmnopqrstuvwxyz";
for (i = 0; i < SIZE; ++i) {
strings[i] = strdup(str);
strfry(strings[i]);
}
qsort(strings, SIZE, sizeof(char *), compare);
}
int main()
{
while (1)
repeat();
}
|
When running with the call-graph option, OProfile will record the function stack every time it takes a sample. opreport --callgraph outputs an entry for each function, where each entry looks similar to:
samples % image name symbol name 197 0.1548 cg main 127036 99.8452 cg repeat 84590 42.5084 libc-2.3.2.so strfry 84590 66.4838 libc-2.3.2.so strfry [self] 39169 30.7850 libc-2.3.2.so random_r 3475 2.7312 libc-2.3.2.so __i686.get_pc_thunk.bx ------------------------------------------------------------------------------- |
Here the non-indented line is the function we're focussing upon
(strfry()). This
line is the same as you'd get from a normal opreport
output.
Above the non-indented line we find the functions that called this
function (for example, repeat() calls
strfry()). The samples and percentage values here
refer to the number of times we took a sample where this call was found
in the stack; the percentage is relative to all other callers of the
function we're focussing on. Note that these values are
not call counts; they only reflect the call stack
every time a sample is taken; that is, if a call is found in the stack
at the time of a sample, it is recorded in this count.
Below the line are functions that are called by
strfry() (called callees).
It's clear here that strfry() calls
random_r(). We also see a special entry with a
"[self]" marker. This records the normal samples for the function, but
the percentage becomes relative to all callees. This allows you to
compare time spent in the function itself compared to functions it
calls. Note that if a function calls itself, then it will appear in the
list of callees of itself, but without the "[self]" marker; so recursive
calls are still clearly separable.
You may have noticed that the output lists main()
as calling strfry(), but it's clear from the source
that this doesn't actually happen. See Section 3, “Interpreting call-graph profiles” for an explanation.
Callgraph output where anonymously mapped code is in the callstack can sometimes be misleading.
For all such code, the samples for the anonymously mapped code are stored in a samples subdirectory
named {anon:anon}/<tgid>.<begin_addr>.<end_addr>.
As stated earlier, if this anonymously mapped code is JITed code from a supported VM like Java,
OProfile creates an ELF file to provide a (somewhat) permanent backing file for the code.
However, when viewing callgraph output, any anonymously mapped code in the callstack
will be attributed to anon (<tgid>: range:<begin_addr>-<end_addr>,
even if a .jo ELF file had been created for it. See the example below.
------------------------------------------------------------------------------- 1 2.2727 libj9ute23.so java.bin traceV 2 4.5455 libj9ute23.so java.bin utsTraceV 4 9.0909 libj9trc23.so java.bin fillInUTInterfaces 37 84.0909 libj9trc23.so java.bin twGetSequenceCounter 8 0.0154 libj9prt23.so java.bin j9time_hires_clock 27 61.3636 anon (tgid:10014 range:0x100000-0x103000) java.bin (no symbols) 9 20.4545 libc-2.4.so java.bin gettimeofday 8 18.1818 libj9prt23.so java.bin j9time_hires_clock [self] ------------------------------------------------------------------------------- |
The output shows that "anon (tgid:10014 range:0x100000-0x103000)" was a callee of
j9time_hires_clock, even though the ELF file 10014.jo was
created for this profile run. Unfortunately, there is currently no way to correlate
that anonymous callgraph entry with its corresponding .jo file.
Often, we'd like to be able to compare two profiles. For example, when analysing the performance of an application, we'd like to make code changes and examine the effect of the change. This is supported in opreport by giving a profile specification that identifies two different profiles. The general form is of:
$ opreport <shared-spec> { <first-profile> } { <second-profile> }
|
We lost our Dragon book down the back of the sofa, so you have to be careful to have spaces around those braces, or things will get hopelessly confused. We can only apologise.
For each of the profiles, the shared section is prefixed, and then the specification is analysed. The usual parameters work both within the shared section, and in the sub-specification within the curly braces.
A typical way to use this feature is with archives created with oparchive. Let's look at an example:
$ ./a
$ oparchive -o orig ./a
$ opcontrol --reset
# edit and recompile a
$ ./a
# now compare the current profile of a with the archived profile
$ opreport -xl ./a { archive:./orig } { }
CPU: PIII, speed 863.233 MHz (estimated)
Counted CPU_CLK_UNHALTED events (clocks processor is not halted) with a
unit mask of 0x00 (No unit mask) count 100000
samples % diff % symbol name
92435 48.5366 +0.4999 a
54226 --- --- c
49222 25.8459 +++ d
48787 25.6175 -2.2e-01 b
|
Note that we specified an empty second profile in the curly braces, as we wanted to use the current session; alternatively, we could have specified another archive, or a tgid etc. We specified the binary a in the shared section, so we matched that in both the profiles we're diffing.
As in the normal output, the results are sorted by the number of samples, and the percentage field represents the relative percentage of the symbol's samples in the second profile.
Notice the new column in the output. This value represents the
percentage change of the relative percent between the first and the
second profile: roughly, "how much more important this symbol is".
Looking at the symbol a(), we can see that it took
roughly the same amount of the total profile in both the first and the
second profile. The function c() was not in the new
profile, so has been marked with ---. Note that the
sample value is the number of samples in the first profile; since we're
displaying results for the second profile, we don't list a percentage
value for it, as it would be meaningless. d() is
new in the second profile, and consequently marked with
+++.
When comparing profiles between different binaries, it should be clear that functions can change in terms of VMA and size. To avoid this problem, opreport considers a symbol to be the same if the symbol name, image name, and owning application name all match; any other factors are ignored. Note that the check for application name means that trying to compare library profiles between two different applications will not work as you might expect: each symbol will be considered different.
Many applications, typically ones involving dynamic compilation into machine code (just-in-time, or "JIT", compilation), have executable mappings that are not backed by an ELF file. opreport has basic support for showing the samples taken in these regions; for example:
$ opreport /usr/bin/mono -l CPU: ppc64 POWER5, speed 1654.34 MHz (estimated) Counted CYCLES events (Processor Cycles using continuous sampling) with a unit mask of 0x00 (No unit mask) count 100000 samples % image name symbol name 47 58.7500 mono (no symbols) 14 17.5000 anon (tgid:3189 range:0xf72aa000-0xf72fa000) (no symbols) 9 11.2500 anon (tgid:3189 range:0xf6cca000-0xf6dd9000) (no symbols) . . . . |
Note that, since such mappings are dependent upon individual invocations of
a binary, these mappings are always listed as a dependent image,
even when using --separate=none.
Equally, the results are not affected by the --merge
option.
As shown in the opreport output above, OProfile is unable to attribute the samples to any symbol(s) because there is no ELF file for this code. Enhanced support for JITed code is now available for some virtual machines; e.g., the Java Virtual Machine. For details about OProfile output for JITed code, see Section 4, “OProfile results with JIT samples”.
For more information about JIT support in OProfile, see Section 1.1, “Support for dynamically compiled (JIT) code”.
The -xml option can be used to generate XML instead of the usual text format. This allows opreport to eliminate some of the constraints dictated by the two dimensional text format. For example, it is possible to separate the sample data across multiple events, cpus and threads. The XML schema implemented by opreport is found in doc/opreport.xsd. It contains more detailed comments about the structure of the XML generated by opreport.
Since XML is consumed by a client program rather than a user, its structure is fairly static. In particular, the --sort option is incompatible with the --xml option. Percentages are not dislayed in the XML so the options related to percentages will have no effect. Full pathnames are always displayed in the XML so --long-filenames is not necessary. The --details option will cause all of the individual sample data to be included in the XML as well as the instruction byte stream for each symbol (for doing disassembly) and can result in very large XML files.
--accumulated / -a
Accumulate sample and percentage counts in the symbol list.
--callgraph / -c
Show callgraph information.
--debug-info / -g
Show source file and line for each symbol.
--demangle / -D none|normal|smart
none: no demangling. normal: use default demangler (default) smart: use pattern-matching to make C++ symbol demangling more readable.
--details / -d
Show per-instruction details for all selected symbols. Note that, for binaries without symbol information, the VMA values shown are raw file offsets for the image binary.
--exclude-dependent / -x
Do not include application-specific images for libraries, kernel modules and the kernel. This option only makes sense if the profile session used --separate.
--exclude-symbols / -e [symbols]
Exclude all the symbols in the given comma-separated list.
--global-percent / -%
Make all percentages relative to the whole profile.
--help / -? / --usage
Show help message.
--image-path / -p [paths]
Comma-separated list of additional paths to search for binaries. This is needed to find modules in kernels 2.6 and upwards.
--root / -R [path]
A path to a filesystem to search for additional binaries.
--include-symbols / -i [symbols]
Only include symbols in the given comma-separated list.
--long-filenames / -f
Output full paths instead of basenames.
--merge / -m [lib,cpu,tid,tgid,unitmask,all]
Merge any profiles separated in a --separate session.
--no-header
Don't output a header detailing profiling parameters.
--output-file / -o [file]
Output to the given file instead of stdout.
--reverse-sort / -r
Reverse the sort from the default.
--session-dir=dir_path
Use sample database out of directory dir_path
instead of the default location (/var/lib/oprofile).
--show-address / -w
Show the VMA address of each symbol (off by default).
--sort / -s [vma,sample,symbol,debug,image]
Sort the list of symbols by, respectively, symbol address, number of samples, symbol name, debug filename and line number, binary image filename.
--symbols / -l
List per-symbol information instead of a binary image summary.
--threshold / -t [percentage]
Only output data for symbols that have more than the given percentage of total samples.
--verbose / -V [options]
Give verbose debugging output.
--version / -v
Show version.
--xml / -X
Generate XML output.