Output Format
The Iai-Callgrind output can be customized with command-line arguments. But, the fine-grained terminal output format is adjusted in the benchmark itself. For example truncating the description, showing a grid, .... Please read the docs for further details.
In this section, I want to point out the possibility to show the cache misses, and in the same manner cache miss rates and cache hit rates in the Iai-Callgrind output.
Showing cache misses
A default Iai-Callgrind benchmark run displays the following metrics:
test_lib_bench_readme_example_fibonacci::bench_fibonacci_group::bench_fibonacci short:10
Instructions: 1734|1734 (No change)
L1 Hits: 2359|2359 (No change)
LL Hits: 0|0 (No change)
RAM Hits: 3|3 (No change)
Total read+write: 2362|2362 (No change)
Estimated Cycles: 2464|2464 (No change)
Iai-Callgrind result: Ok. 1 without regressions; 0 regressed; 1 benchmarks finished in 0.49333s
The cache and ram hits, Total read+write
and Estimated Cycles
are actually
not part of the original collected callgrind metrics but calculated from them.
If you want to see the cache misses nonetheless, you can achieve this by
specifying the output format for example at top-level for all benchmarks in the
same file in the main!
macro:
extern crate iai_callgrind; use iai_callgrind::{library_benchmark, library_benchmark_group}; use iai_callgrind::{main, LibraryBenchmarkConfig, CallgrindMetrics, Callgrind}; #[library_benchmark] fn bench() {} library_benchmark_group!(name = my_group; benchmarks = bench); fn main() { main!( config = LibraryBenchmarkConfig::default() .tool(Callgrind::default() .format([CallgrindMetrics::All]) ); library_benchmark_groups = my_group ); }
or by using the command-line argument --callgrind-metrics=@all
or the
environment variable IAI_CALLGRIND_CALLGRIND_METRICS=@all
.
The Iai-Callgrind output will then show all cache metrics:
test_lib_bench_readme_example_fibonacci::bench_fibonacci_group::bench_fibonacci short:10
Instructions: 1734|N/A (*********)
Dr: 270|N/A (*********)
Dw: 358|N/A (*********)
I1mr: 3|N/A (*********)
D1mr: 0|N/A (*********)
D1mw: 0|N/A (*********)
ILmr: 3|N/A (*********)
DLmr: 0|N/A (*********)
DLmw: 0|N/A (*********)
I1 Miss Rate: 0.17301|N/A (*********)
LLi Miss Rate: 0.17301|N/A (*********)
D1 Miss Rate: 0.00000|N/A (*********)
LLd Miss Rate: 0.00000|N/A (*********)
LL Miss Rate: 0.12701|N/A (*********)
L1 Hits: 2359|N/A (*********)
LL Hits: 0|N/A (*********)
RAM Hits: 3|N/A (*********)
L1 Hit Rate: 99.8730|N/A (*********)
LL Hit Rate: 0.00000|N/A (*********)
RAM Hit Rate: 0.12701|N/A (*********)
Total read+write: 2362|N/A (*********)
Estimated Cycles: 2464|N/A (*********)
Iai-Callgrind result: Ok. 1 without regressions; 0 regressed; 1 benchmarks finished in 0.48898s
The callgrind output format can be fully customized showing only the metrics
you're interested in and in any order. The docs of
Callgrind::format
and CallgrindMetrics
show all the
possibilities for Callgrind
. The output format of the other valgrind tools
can be customized in the same way. More details can be found in the docs for the
respective format (Dhat::format
, DhatMetric
, Cachegrind::format
,
CachegrindMetric
, ...) and for their respective command-line arguments with
--help
.
Setting a tolerance margin for metric changes
Not every benchmark is deterministic, for example when hash maps or sets are
involved or even just by using std::env::var
in the benchmarked code.
Benchmarks which show variances in the output of the metrics can be configured
to tolerate a specific margin in the benchmark output:
extern crate iai_callgrind; use std::collections::HashMap; use std::hint::black_box; use iai_callgrind::{ library_benchmark, library_benchmark_group, main, LibraryBenchmarkConfig, OutputFormat, }; fn make_hashmap(num: usize) -> HashMap<String, usize> { (0..num).fold(HashMap::new(), |mut acc, e| { acc.insert(format!("element: {e}"), e); acc }) } #[library_benchmark( config = LibraryBenchmarkConfig::default() .output_format(OutputFormat::default() .tolerance(0.9) ) )] #[bench::tolerance(make_hashmap(100))] fn bench_hash_map(map: HashMap<String, usize>) -> Option<usize> { black_box( map.iter() .find_map(|(key, value)| (key == "element: 12345").then_some(*value)), ) } library_benchmark_group!(name = my_group; benchmarks = bench_hash_map); fn main() { main!(library_benchmark_groups = my_group); }
or by using the command-line argument --tolerance=0.9
(or
IAI_CALLGRIND_TOLERANCE=0.9
).
The second or any following Iai-Callgrind run might then show something like that:
lib_bench_tolerance::my_group::bench_hash_map tolerance:make_hashmap(100)
Instructions: 19787|19623 (Tolerance)
L1 Hits: 26395|26123 (+1.04123%) [+1.01041x]
LL Hits: 0|0 (No change)
RAM Hits: 22|22 (No change)
Total read+write: 26417|26145 (+1.04035%) [+1.01040x]
Estimated Cycles: 27165|26893 (+1.01142%) [+1.01011x]
Iai-Callgrind result: Ok. 1 without regressions; 0 regressed; 1 benchmarks finished in 0.15735s
and Instructions
displays Tolerance
instead of a difference.