Quickstart

Create a file $WORKSPACE_ROOT/benches/library_benchmark.rs and add

[[bench]]
name = "library_benchmark"
harness = false

to your Cargo.toml. harness = false, tells cargo to not use the default rust benchmarking harness which is important because Iai-Callgrind has an own benchmarking harness.

Then copy the following content into this file:

extern crate iai_callgrind;
use iai_callgrind::{main, library_benchmark_group, library_benchmark};
use std::hint::black_box;

fn fibonacci(n: u64) -> u64 {
    match n {
        0 => 1,
        1 => 1,
        n => fibonacci(n - 1) + fibonacci(n - 2),
    }
}

#[library_benchmark]
#[bench::short(10)]
#[bench::long(30)]
fn bench_fibonacci(value: u64) -> u64 {
    black_box(fibonacci(value))
}

library_benchmark_group!(
    name = bench_fibonacci_group;
    benchmarks = bench_fibonacci
);

fn main() {
main!(library_benchmark_groups = bench_fibonacci_group);
}

Now, that your first library benchmark is set up, you can run it with

cargo bench

and should see something like the below

library_benchmark::bench_fibonacci_group::bench_fibonacci short:10
  Instructions:                1734|N/A             (*********)
  L1 Hits:                     2359|N/A             (*********)
  L2 Hits:                        0|N/A             (*********)
  RAM Hits:                       3|N/A             (*********)
  Total read+write:            2362|N/A             (*********)
  Estimated Cycles:            2464|N/A             (*********)
library_benchmark::bench_fibonacci_group::bench_fibonacci long:30
  Instructions:            26214734|N/A             (*********)
  L1 Hits:                 35638616|N/A             (*********)
  L2 Hits:                        2|N/A             (*********)
  RAM Hits:                       4|N/A             (*********)
  Total read+write:        35638622|N/A             (*********)
  Estimated Cycles:        35638766|N/A             (*********)

In addition, you'll find the callgrind output and the output of other valgrind tools in target/iai, if you want to investigate further with a tool like callgrind_annotate etc.

When running the same benchmark again, the output will report the differences between the current and the previous run. Say you've made change to the fibonacci function, then you may see something like this:

library_benchmark::bench_fibonacci_group::bench_fibonacci short:10
  Instructions:                2805|1734            (+61.7647%) [+1.61765x]
  L1 Hits:                     3815|2359            (+61.7211%) [+1.61721x]
  L2 Hits:                        0|0               (No change)
  RAM Hits:                       3|3               (No change)
  Total read+write:            3818|2362            (+61.6427%) [+1.61643x]
  Estimated Cycles:            3920|2464            (+59.0909%) [+1.59091x]
library_benchmark::bench_fibonacci_group::bench_fibonacci long:30
  Instructions:            16201597|26214734        (-38.1966%) [-1.61803x]
  L1 Hits:                 22025876|35638616        (-38.1966%) [-1.61803x]
  L2 Hits:                        2|2               (No change)
  RAM Hits:                       4|4               (No change)
  Total read+write:        22025882|35638622        (-38.1966%) [-1.61803x]
  Estimated Cycles:        22026026|35638766        (-38.1964%) [-1.61803x]