perf-report (1)
NAME
perf-report - Read perf.data (created by perf record) and display the profileSYNOPSIS
perf report [-i <file> | --input=file]
DESCRIPTION
This command displays the performance counter profile information recorded via perf record.
OPTIONS
-i, --input=
- Input file name. (default: perf.data unless stdin is a fifo)
-v, --verbose
- Be more verbose. (show symbol address, etc)
-d, --dsos=
- Only consider symbols in these dsos. CSV that understands m[blue]file://filenamem[] entries.
-n, --show-nr-samples
- Show the number of samples for each symbol
--showcpuutilization
- Show sample percentage for different cpu modes.
-T, --threads
- Show per-thread event counters
-c, --comms=
- Only consider symbols in these comms. CSV that understands m[blue]file://filenamem[] entries.
-S, --symbols=
- Only consider these symbols. CSV that understands m[blue]file://filenamem[] entries.
--symbol-filter=
- Only show symbols that match (partially) with this filter.
-U, --hide-unresolved
- Only display entries resolved to a symbol.
-s, --sort=
-
Sort histogram entries by given key(s) - multiple keys can be specified in CSV format. Following sort keys are available: pid, comm, dso, symbol, parent, cpu, srcline, weight, local_weight.
-
Each key has following meaning:
- • comm: command (name) of the task which can be read via /proc/<pid>/comm
- • pid: command and tid of the task
- • dso: name of library or module executed at the time of sample
- • symbol: name of function executed at the time of sample
- • parent: name of function matched to the parent regex filter. Unmatched entries are displayed as "[other]".
- • cpu: cpu number the task ran at the time of sample
- • srcline: filename and line number executed at the time of sample. The DWARF debugging info must be provided.
- • weight: Event specific weight, e.g. memory latency or transaction abort cost. This is the global weight.
- • local_weight: Local weight version of the weight above.
-
•
transaction: Transaction abort flags.
-
By default, comm, dso and symbol keys are used. (i.e. --sort comm,dso,symbol)
-
If --branch-stack option is used, following sort keys are also available: dso_from, dso_to, symbol_from, symbol_to, mispredict.
-
- • dso_from: name of library or module branched from
- • dso_to: name of library or module branched to
- • symbol_from: name of function branched from
- • symbol_to: name of function branched to
- • mispredict: "N" for predicted branch, "Y" for mispredicted branch
- • in_tx: branch in TSX transaction
-
•
abort: TSX transaction abort.
-
And default sort keys are changed to comm, dso_from, symbol_from, dso_to and symbol_to, see '--branch-stack'.
-
-
-p, --parent=<regex>
- A regex filter to identify parent. The parent is a caller of this function and searched through the callchain, thus it requires callchain information recorded. The pattern is in the exteneded regex format and defaults to "^sys_|^do_page_fault", see --sort parent.
-x, --exclude-other
- Only display entries with parent-match.
-w, --column-widths=<width[,width...]>
- Force each column width to the provided list, for large terminal readability.
-t, --field-separator=
- Use a special separator character and don't pad with spaces, replacing all occurrences of this separator in symbol names (and other output) with a . character, that thus it's the only non valid separator.
-D, --dump-raw-trace
- Dump raw trace in ASCII.
-g [type,min[,limit],order[,key]], --call-graph
-
Display call chains using type, min percent threshold, optional print limit and order. type can be either:
- • flat: single column, linear exposure of call chains.
- • graph: use a graph tree, displaying absolute overhead rates.
-
•
fractal: like graph, but displays relative rates. Each branch of the tree is considered as a new profiled object.
-
order can be either: - callee: callee based call graph. - caller: inverted caller based call graph.
-
key can be: - function: compare on functions - address: compare on individual code addresses
-
Default: fractal,0.5,callee,function.
-
--max-stack
-
Set the stack depth limit when parsing the callchain, anything beyond the specified depth will be ignored. This is a trade-off between information loss and faster processing especially for workloads that can have a very long callchain stack.
-
Default: 127
-
-G, --inverted
- alias for inverted caller based call graph.
--ignore-callees=<regex>
- Ignore callees of the function(s) matching the given regex. This has the effect of collecting the callers of each such function into one place in the call-graph tree.
--pretty=<key>
- Pretty printing style. key: normal, raw
--stdio
- Use the stdio interface.
--tui
- Use the TUI interface, that is integrated with annotate and allows zooming into DSOs or threads, among other features. Use of --tui requires a tty, if one is not present, as when piping to other commands, the stdio interface is used.
--gtk
- Use the GTK2 interface.
-k, --vmlinux=<file>
- vmlinux pathname
--kallsyms=<file>
- kallsyms pathname
-m, --modules
- Load module symbols. WARNING: This should only be used with -k and a LIVE kernel.
-f, --force
- Don't complain, do it.
--symfs=<directory>
- Look for files with symbols relative to this directory.
-C, --cpu
- Only report samples for the list of CPUs provided. Multiple CPUs can be provided as a comma-separated list with no space: 0,1. Ranges of CPUs are specified with -: 0-2. Default is to report samples on all CPUs.
-M, --disassembler-style=
- Set disassembler style for objdump.
--source
- Interleave source code with assembly code. Enabled by default, disable with --no-source.
--asm-raw
- Show raw instruction encoding of assembly instructions.
--show-total-period
- Show a column with the sum of periods.
-I, --show-info
- Display extended information about the perf.data file. This adds information which may be very large and thus may clutter the display. It currently includes: cpu and numa topology of the host system.
-b, --branch-stack
- Use the addresses of sampled taken branches instead of the instruction address to build the histograms. To generate meaningful output, the perf.data file must have been obtained using perf record -b or perf record --branch-filter xxx where xxx is a branch filter option. perf report is able to auto-detect whether a perf.data file contains branch stacks and it will automatically switch to the branch view mode, unless --no-branch-stack is used.
--objdump=<path>
- Path to objdump binary.
--group
- Show event group information together.
--demangle
- Demangle symbol names to human readable form. It's enabled by default, disable with --no-demangle.
--percent-limit
- Do not show entries which have an overhead under that percent. (Default: 0).
SEE ALSO
perf-stat(1), perf-annotate(1)