Output Format

The Iai-Callgrind output can be customized with command-line arguments. But, the fine-grained terminal output format is adjusted in the benchmark itself. For example truncating the description, showing a grid, .... Please read the docs for further details.

In this section, I want to point out the possibility to show the cache misses, and in the same manner cache miss rates and cache hit rates in the Iai-Callgrind output.

Showing cache misses

A default Iai-Callgrind benchmark run displays the following metrics:

test_lib_bench_readme_example_fibonacci::bench_fibonacci_group::bench_fibonacci short:10
  Instructions:                        1734|1734                 (No change)
  L1 Hits:                             2359|2359                 (No change)
  LL Hits:                                0|0                    (No change)
  RAM Hits:                               3|3                    (No change)
  Total read+write:                    2362|2362                 (No change)
  Estimated Cycles:                    2464|2464                 (No change)

Iai-Callgrind result: Ok. 1 without regressions; 0 regressed; 1 benchmarks finished in 0.49333s

The cache and ram hits, Total read+write and Estimated Cycles are actually not part of the original collected callgrind metrics but calculated from them. If you want to see the cache misses nonetheless, you can achieve this by specifying the output format for example at top-level for all benchmarks in the same file in the main! macro:

extern crate iai_callgrind;
use iai_callgrind::{library_benchmark, library_benchmark_group};
use iai_callgrind::{main, LibraryBenchmarkConfig, CallgrindMetrics, Callgrind};

#[library_benchmark] fn bench() {}
library_benchmark_group!(name = my_group; benchmarks = bench);
fn main() {
main!(
    config = LibraryBenchmarkConfig::default()
        .tool(Callgrind::default()
            .format([CallgrindMetrics::All])
        );
    library_benchmark_groups = my_group
);
}

or by using the command-line argument --callgrind-metrics=@all or the environment variable IAI_CALLGRIND_CALLGRIND_METRICS=@all.

The Iai-Callgrind output will then show all cache metrics:

test_lib_bench_readme_example_fibonacci::bench_fibonacci_group::bench_fibonacci short:10
  Instructions:                        1734|N/A                  (*********)
  Dr:                                   270|N/A                  (*********)
  Dw:                                   358|N/A                  (*********)
  I1mr:                                   3|N/A                  (*********)
  D1mr:                                   0|N/A                  (*********)
  D1mw:                                   0|N/A                  (*********)
  ILmr:                                   3|N/A                  (*********)
  DLmr:                                   0|N/A                  (*********)
  DLmw:                                   0|N/A                  (*********)
  I1 Miss Rate:                     0.17301|N/A                  (*********)
  LLi Miss Rate:                    0.17301|N/A                  (*********)
  D1 Miss Rate:                     0.00000|N/A                  (*********)
  LLd Miss Rate:                    0.00000|N/A                  (*********)
  LL Miss Rate:                     0.12701|N/A                  (*********)
  L1 Hits:                             2359|N/A                  (*********)
  LL Hits:                                0|N/A                  (*********)
  RAM Hits:                               3|N/A                  (*********)
  L1 Hit Rate:                      99.8730|N/A                  (*********)
  LL Hit Rate:                      0.00000|N/A                  (*********)
  RAM Hit Rate:                     0.12701|N/A                  (*********)
  Total read+write:                    2362|N/A                  (*********)
  Estimated Cycles:                    2464|N/A                  (*********)

Iai-Callgrind result: Ok. 1 without regressions; 0 regressed; 1 benchmarks finished in 0.48898s

The callgrind output format can be fully customized showing only the metrics you're interested in and in any order. The docs of Callgrind::format and CallgrindMetrics show all the possibilities for Callgrind. The output format of the other valgrind tools can be customized in the same way. More details can be found in the docs for the respective format (Dhat::format, DhatMetric, Cachegrind::format, CachegrindMetric, ...) and for their respective command-line arguments with --help.

Setting a tolerance margin for metric changes

Not every benchmark is deterministic, for example when hash maps or sets are involved or even just by using std::env::var in the benchmarked code. Benchmarks which show variances in the output of the metrics can be configured to tolerate a specific margin in the benchmark output:

extern crate iai_callgrind;
use std::collections::HashMap;
use std::hint::black_box;

use iai_callgrind::{
    library_benchmark, library_benchmark_group, main, LibraryBenchmarkConfig, OutputFormat,
};

fn make_hashmap(num: usize) -> HashMap<String, usize> {
    (0..num).fold(HashMap::new(), |mut acc, e| {
        acc.insert(format!("element: {e}"), e);
        acc
    })
}

#[library_benchmark(
    config = LibraryBenchmarkConfig::default()
        .output_format(OutputFormat::default()
            .tolerance(0.9)
        )
)]
#[bench::tolerance(make_hashmap(100))]
fn bench_hash_map(map: HashMap<String, usize>) -> Option<usize> {
    black_box(
        map.iter()
            .find_map(|(key, value)| (key == "element: 12345").then_some(*value)),
    )
}

library_benchmark_group!(name = my_group; benchmarks = bench_hash_map);
fn main() {
main!(library_benchmark_groups = my_group);
}

or by using the command-line argument --tolerance=0.9 (or IAI_CALLGRIND_TOLERANCE=0.9).

The second or any following Iai-Callgrind run might then show something like that:

lib_bench_tolerance::my_group::bench_hash_map tolerance:make_hashmap(100)
  Instructions:                       19787|19623                (Tolerance)
  L1 Hits:                            26395|26123                (+1.04123%) [+1.01041x]
  LL Hits:                                0|0                    (No change)
  RAM Hits:                              22|22                   (No change)
  Total read+write:                   26417|26145                (+1.04035%) [+1.01040x]
  Estimated Cycles:                   27165|26893                (+1.01142%) [+1.01011x]

Iai-Callgrind result: Ok. 1 without regressions; 0 regressed; 1 benchmarks finished in 0.15735s

and Instructions displays Tolerance instead of a difference.