27.8.1. BST vs heap vs hashmap

The following benchmark setup works both:

  • on host through timers + granule

  • gem5 with dumpstats, which can get more precise results with granule == 1

It has been used to answer:

To benchmark on the host, we do:

./build-userland-in-tree \
  --force-rebuild \
  --optimization-level 3 \
  ./userland/cpp/bst_vs_heap_vs_hashmap.cpp \
;
./userland/cpp/bst_vs_heap_vs_hashmap.out 10000000 10000 0 | tee bst_vs_heap_vs_hashmap.dat
gnuplot \
  -e 'input_noext="bst_vs_heap_vs_hashmap"' \
  -e 'heap_zoom_max=50' \
  -e 'hashmap_zoom_max=400' \
  ./bst-vs-heap-vs-hashmap.gnuplot \
;
xdg-open bst_vs_heap_vs_hashmap.tmp.png

The parameters heap_zoom_max and hashmap_zoom_max are chosen manually interactively to best showcase the regions of interest in those plots.

To benchmark on gem5, we first build the benchmark with m5ops instructions enabled, and then we run it and extract the stats:

./build-userland \
  --arch x86_64 \
  --ccflags='-DLKMC_M5OPS_ENABLE=1' \
  --force-rebuild userland/cpp/bst_vs_heap_vs_hashmap.cpp \
  --optimization-level 3 \
;
./run \
  --arch x86_64 \
  --emulator gem5 \
  --userland userland/cpp/bst_vs_heap_vs_hashmap.cpp \
  --cli-args='100000 1 0' \
  -- \
  --cpu-type=DerivO3CPU \
  --caches \
  --l2cache \
  --l1d_size=32kB \
  --l1i_size=32kB \
  --l2_size=256kB \
  --l3_size=20MB \
;
./bst-vs-heap-vs-hashmap-gem5-stats --arch x86_64 | tee bst_vs_heap_vs_hashmap_gem5.dat
gnuplot \
  -e 'input_noext="bst_vs_heap_vs_hashmap_gem5"' \
  -e 'heap_zoom_max=500' \
  -e 'hashmap_zoom_max=400' \
  ./bst-vs-heap-vs-hashmap.gnuplot \
;
xdg-open bst_vs_heap_vs_hashmap_gem5.tmp.png

TODO: the gem5 simulation blows up on a tcmalloc allocation somewhere near 25k elements as of 3fdd83c2c58327d9714fa2347c724b78d7c05e2b + 1, likely linked to the extreme inefficiency of the stats collection?

The cache sizes were chosen to match the host 2017 Lenovo ThinkPad P51 to improve the comparison. Ideally we should also use the same standard library.

Note that this will take a long time, and will produce a humongous ~40Gb stats file as explained at: Section 24.10.3.2, “gem5 only dump selected stats”

Sources: