perf-report(1) ============== NAME ---- perf-report - Read perf.data (created by perf record) and display the profile SYNOPSIS -------- [verse] 'perf report' [-i | --input=file] DESCRIPTION ----------- This command displays the performance counter profile information recorded via perf record. OPTIONS ------- -i:: --input=:: Input file name. (default: perf.data unless stdin is a fifo) -v:: --verbose:: Be more verbose. (show symbol address, etc) -n:: --show-nr-samples:: Show the number of samples for each symbol --showcpuutilization:: Show sample percentage for different cpu modes. -T:: --threads:: Show per-thread event counters -c:: --comms=:: Only consider symbols in these comms. CSV that understands file://filename entries. This option will affect the percentage of the overhead column. See --percentage for more info. --pid=:: Only show events for given process ID (comma separated list). --tid=:: Only show events for given thread ID (comma separated list). -d:: --dsos=:: Only consider symbols in these dsos. CSV that understands file://filename entries. This option will affect the percentage of the overhead column. See --percentage for more info. -S:: --symbols=:: Only consider these symbols. CSV that understands file://filename entries. This option will affect the percentage of the overhead column. See --percentage for more info. --symbol-filter=:: Only show symbols that match (partially) with this filter. -U:: --hide-unresolved:: Only display entries resolved to a symbol. -s:: --sort=:: Sort histogram entries by given key(s) - multiple keys can be specified in CSV format. Following sort keys are available: pid, comm, dso, symbol, parent, cpu, srcline, weight, local_weight. Each key has following meaning: - comm: command (name) of the task which can be read via /proc//comm - pid: command and tid of the task - dso: name of library or module executed at the time of sample - symbol: name of function executed at the time of sample - parent: name of function matched to the parent regex filter. Unmatched entries are displayed as "[other]". - cpu: cpu number the task ran at the time of sample - srcline: filename and line number executed at the time of sample. The DWARF debugging info must be provided. - weight: Event specific weight, e.g. memory latency or transaction abort cost. This is the global weight. - local_weight: Local weight version of the weight above. - transaction: Transaction abort flags. - overhead: Overhead percentage of sample - overhead_sys: Overhead percentage of sample running in system mode - overhead_us: Overhead percentage of sample running in user mode - overhead_guest_sys: Overhead percentage of sample running in system mode on guest machine - overhead_guest_us: Overhead percentage of sample running in user mode on guest machine - sample: Number of sample - period: Raw number of event count of sample By default, comm, dso and symbol keys are used. (i.e. --sort comm,dso,symbol) If --branch-stack option is used, following sort keys are also available: dso_from, dso_to, symbol_from, symbol_to, mispredict. - dso_from: name of library or module branched from - dso_to: name of library or module branched to - symbol_from: name of function branched from - symbol_to: name of function branched to - mispredict: "N" for predicted branch, "Y" for mispredicted branch - in_tx: branch in TSX transaction - abort: TSX transaction abort. And default sort keys are changed to comm, dso_from, symbol_from, dso_to and symbol_to, see '--branch-stack'. -F:: --fields=:: Specify output field - multiple keys can be specified in CSV format. Following fields are available: overhead, overhead_sys, overhead_us, overhead_children, sample and period. Also it can contain any sort key(s). By default, every sort keys not specified in -F will be appended automatically. If --mem-mode option is used, following sort keys are also available (incompatible with --branch-stack): symbol_daddr, dso_daddr, locked, tlb, mem, snoop, dcacheline. - symbol_daddr: name of data symbol being executed on at the time of sample - dso_daddr: name of library or module containing the data being executed on at the time of sample - locked: whether the bus was locked at the time of sample - tlb: type of tlb access for the data at the time of sample - mem: type of memory access for the data at the time of sample - snoop: type of snoop (if any) for the data at the time of sample - dcacheline: the cacheline the data address is on at the time of sample And default sort keys are changed to local_weight, mem, sym, dso, symbol_daddr, dso_daddr, snoop, tlb, locked, see '--mem-mode'. -p:: --parent=:: A regex filter to identify parent. The parent is a caller of this function and searched through the callchain, thus it requires callchain information recorded. The pattern is in the exteneded regex format and defaults to "\^sys_|^do_page_fault", see '--sort parent'. -x:: --exclude-other:: Only display entries with parent-match. -w:: --column-widths=:: Force each column width to the provided list, for large terminal readability. 0 means no limit (default behavior). -t:: --field-separator=:: Use a special separator character and don't pad with spaces, replacing all occurrences of this separator in symbol names (and other output) with a '.' character, that thus it's the only non valid separator. -D:: --dump-raw-trace:: Dump raw trace in ASCII. -g [type,min[,limit],order[,key][,branch]]:: --call-graph:: Display call chains using type, min percent threshold, optional print limit and order. type can be either: - flat: single column, linear exposure of call chains. - graph: use a graph tree, displaying absolute overhead rates. - fractal: like graph, but displays relative rates. Each branch of the tree is considered as a new profiled object. + order can be either: - callee: callee based call graph. - caller: inverted caller based call graph. key can be: - function: compare on functions - address: compare on individual code addresses branch can be: - branch: include last branch information in callgraph when available. Usually more convenient to use --branch-history for this. Default: fractal,0.5,callee,function. --children:: Accumulate callchain of children to parent entry so that then can show up in the output. The output will have a new "Children" column and will be sorted on the data. It requires callchains are recorded. See the `overhead calculation' section for more details. --max-stack:: Set the stack depth limit when parsing the callchain, anything beyond the specified depth will be ignored. This is a trade-off between information loss and faster processing especially for workloads that can have a very long callchain stack. Default: 127 -G:: --inverted:: alias for inverted caller based call graph. --ignore-callees=:: Ignore callees of the function(s) matching the given regex. This has the effect of collecting the callers of each such function into one place in the call-graph tree. --pretty=:: Pretty printing style. key: normal, raw --stdio:: Use the stdio interface. --tui:: Use the TUI interface, that is integrated with annotate and allows zooming into DSOs or threads, among other features. Use of --tui requires a tty, if one is not present, as when piping to other commands, the stdio interface is used. --gtk:: Use the GTK2 interface. -k:: --vmlinux=:: vmlinux pathname --kallsyms=:: kallsyms pathname -m:: --modules:: Load module symbols. WARNING: This should only be used with -k and a LIVE kernel. -f:: --force:: Don't complain, do it. --symfs=:: Look for files with symbols relative to this directory. -C:: --cpu:: Only report samples for the list of CPUs provided. Multiple CPUs can be provided as a comma-separated list with no space: 0,1. Ranges of CPUs are specified with -: 0-2. Default is to report samples on all CPUs. -M:: --disassembler-style=:: Set disassembler style for objdump. --source:: Interleave source code with assembly code. Enabled by default, disable with --no-source. --asm-raw:: Show raw instruction encoding of assembly instructions. --show-total-period:: Show a column with the sum of periods. -I:: --show-info:: Display extended information about the perf.data file. This adds information which may be very large and thus may clutter the display. It currently includes: cpu and numa topology of the host system. -b:: --branch-stack:: Use the addresses of sampled taken branches instead of the instruction address to build the histograms. To generate meaningful output, the perf.data file must have been obtained using perf record -b or perf record --branch-filter xxx where xxx is a branch filter option. perf report is able to auto-detect whether a perf.data file contains branch stacks and it will automatically switch to the branch view mode, unless --no-branch-stack is used. --branch-history:: Add the addresses of sampled taken branches to the callstack. This allows to examine the path the program took to each sample. The data collection must have used -b (or -j) and -g. --objdump=:: Path to objdump binary. --group:: Show event group information together. --demangle:: Demangle symbol names to human readable form. It's enabled by default, disable with --no-demangle. --demangle-kernel:: Demangle kernel symbol names to human readable form (for C++ kernels). --mem-mode:: Use the data addresses of samples in addition to instruction addresses to build the histograms. To generate meaningful output, the perf.data file must have been obtained using perf record -d -W and using a special event -e cpu/mem-loads/ or -e cpu/mem-stores/. See 'perf mem' for simpler access. --percent-limit:: Do not show entries which have an overhead under that percent. (Default: 0). --percentage:: Determine how to display the overhead percentage of filtered entries. Filters can be applied by --comms, --dsos and/or --symbols options and Zoom operations on the TUI (thread, dso, etc). "relative" means it's relative to filtered entries only so that the sum of shown entries will be always 100%. "absolute" means it retains the original value before and after the filter is applied. --header:: Show header information in the perf.data file. This includes various information like hostname, OS and perf version, cpu/mem info, perf command line, event list and so on. Currently only --stdio output supports this feature. --header-only:: Show only perf.data header (forces --stdio). --itrace:: Options for decoding instruction tracing data. The options are: i synthesize instructions events b synthesize branches events c synthesize branches events (calls only) r synthesize branches events (returns only) x synthesize transactions events e synthesize error events d create a debug log g synthesize a call chain (use with i or x) The default is all events i.e. the same as --itrace=ibxe In addition, the period (default 100000) for instructions events can be specified in units of: i instructions t ticks ms milliseconds us microseconds ns nanoseconds (default) Also the call chain size (default 16, max. 1024) for instructions or transactions events can be specified. To disable decoding entirely, use --no-itrace. include::callchain-overhead-calculation.txt[] SEE ALSO -------- linkperf:perf-stat[1], linkperf:perf-annotate[1]