Important default behaviour

The environment variables are cleared before running a library benchmark. Have a look into the Configuration section if you need to change that behavior. Iai-Callgrind sometimes deviates from the valgrind defaults which are:

Iai-Callgrind	Valgrind (v3.23)
`--trace-children=yes`	`--trace-children=no`
`--fair-sched=try`	`--fair-sched=no`
`--separate-threads=yes`	`--separate-threads=no`
`--cache-sim=yes`	`--cache-sim=no`

The thread and subprocess specific valgrind options enable tracing threads and subprocesses basically but there's usually some additional configuration necessary to trace the metrics of threads and subprocesses.

As show in the table above, the benchmarks run with cache simulation switched on. This adds run time. If you don't need the cache metrics and estimation of cycles, you can easily switch cache simulation off for example with:

#![allow(unused)]
fn main() {
extern crate iai_callgrind;
use iai_callgrind::{LibraryBenchmarkConfig, Callgrind};

LibraryBenchmarkConfig::default().tool(Callgrind::with_args(["--cache-sim=no"]));
}

To switch off cache simulation for all benchmarks in the same file:

extern crate iai_callgrind;
mod my_lib { pub fn fibonacci(a: u64) -> u64 { a } }
use iai_callgrind::{
    main, library_benchmark_group, library_benchmark, LibraryBenchmarkConfig,
    Callgrind
};
use std::hint::black_box;

#[library_benchmark]
fn bench_fibonacci() -> u64 {
    black_box(my_lib::fibonacci(10))
}

library_benchmark_group!(name = fibonacci_group; benchmarks = bench_fibonacci);

fn main() {
main!(
    config = LibraryBenchmarkConfig::default()
        .tool(Callgrind::with_args(["--cache-sim=no"]));
    library_benchmark_groups = fibonacci_group
);
}

Iai-Callgrind reports the cache hits and an estimation of cpu cycles:

test_lib_bench_readme_example_fibonacci::bench_fibonacci_group::bench_fibonacci short:10
  Instructions:                        1734|1734                 (No change)
  L1 Hits:                             2359|2359                 (No change)
  L2 Hits:                                0|0                    (No change)
  RAM Hits:                               3|3                    (No change)
  Total read+write:                    2362|2362                 (No change)
  Estimated Cycles:                    2464|2464                 (No change)

Iai-Callgrind result: Ok. 1 without regressions; 0 regressed; 1 benchmarks finished in 0.49333s

If you prefer cache misses over cache hits or just want both metrics displayed you can fully customize the callgrind output format.

Iai-Callgrind Guide

Important default behaviour