27.8. Algorithms

This is still work in progress and needs better automation, but is already a good sketch. Key missing features:

  • actually check that outputs are correct in ./test

  • create a mechanism to run all or some selected hand coded inputs

  • create a mechanism to run generated input

The idea was originally started at: https://github.com/cirosantilli/algorithm-cheat

The key idea is that input / output pairs are present in human readable files generated either:

  • manually for small test inputs

  • with a Python script for larger randomized tests

Test programs then:

  • read input from sdtin

  • produce output to stdout

so that we can compare the output to the expected one.

This way, tests can be reused across several implementations in different languages, emulating the many multi-language programming competition websites out there.

For example, for a native run we can can run a set / sorting test:

cd userland/algorithm/set
./build

# Run with a small hand written test.
./std_set.out < test_data/8.i > tmp.raw

# Extract the output from the sorted stdout, which also
# contained some timing information.
./parse_output output < tmp.raw > tmp.o

# Compare the output to the Expected one.
cmp tmp.o test_data/8.e

# Same but now with a large randomly generated input.
./generate_io
./std_set.out < tmp.i | ./parse_output output > tmp.o
cmp tmp.o tmp.e

It is also possible to the algorithm tests normally from emulators in User mode simulation by setting stdin as explained at syscall emulation mode program stdin, e.g.:

./run --arch aarch64 -u userland/algorithm/set/std_set.cpp --stdin-file userland/algorithm/set/test_data/8.i

Sources:

userland/algorithm/set/parse_output is needed because timing instrumentation measurements must be embedded in the program itself to allow:

  • discounting the input reading / output writing operations from the actual "read / write to / from memory algorithm" itself

  • measuring the evolution of the benchmark mid way, e.g. to see how the current container size affects insertion time: BST vs heap vs hashmap

The following are also interesting Buildroot libraries that we could benchmark:

  • Armadillo C++: linear algebra

  • fftw: Fourier transform

  • Flann

  • GSL: various

  • liblinear

  • libspacialindex

  • libtommath

  • qhull

These are good targets for performance analysis with gem5, and there is some overlap between this section and Benchmarks.