24.2. gem5 run benchmark
OK, this is why we used gem5 in the first place, performance measurements!
Let’s see how many cycles dhrystone, which Buildroot provides, takes for a few different input parameters.
We will do that for various input parameters on full system by taking a checkpoint after the boot finishes a fast atomic CPU boot, and then we will restore in a more detailed mode and run the benchmark:
./build-buildroot --config 'BR2_PACKAGE_DHRYSTONE=y' # Boot fast, take checkpoint, and exit. ./run --arch aarch64 --emulator gem5 --eval-after './gem5.sh' # Restore the checkpoint after boot, and benchmark with input 1000. ./run \ --arch aarch64 \ --emulator gem5 \ --eval-after './gem5.sh' \ --gem5-readfile 'm5 resetstats;dhrystone 1000;m5 dumpstats' \ --gem5-restore 1 \ -- \ --cpu-type=HPI \ --restore-with-cpu=HPI \ --caches \ --l2cache \ --l1d_size=64kB \ --l1i_size=64kB \ --l2_size=256kB \ ; # Get the value for number of cycles. # head because there are two lines: our dumpstats and the # automatic dumpstats at the end which we don't care about. ./gem5-stat --arch aarch64 | head -n 1 # Now for input 10000. ./run \ --arch aarch64 \ --emulator gem5 \ --eval-after './gem5.sh' \ --gem5-readfile 'm5 resetstats;dhrystone 10000;m5 dumpstats' \ --gem5-restore 1 \ -- \ --cpu-type=HPI \ --restore-with-cpu=HPI \ --caches \ --l2cache \ --l1d_size=64kB \ --l1i_size=64kB \ --l2_size=256kB \ ; ./gem5-stat --arch aarch64 | head -n 1
If you ever need a shell to quickly inspect the system state after boot, you can just use:
./run \ --arch aarch64 \ --emulator gem5 \ --eval-after './gem5.sh' \ --gem5-readfile 'sh' \ --gem5-restore 1 \
This procedure is further automated and DRYed up at:
./gem5-bench-dhrystone cat out/gem5-bench-dhrystone.txt
Source: gem5-bench-dhrystone
Output at 2438410c25e200d9766c8c65773ee7469b599e4a + 1:
n cycles 1000 13665219 10000 20559002 100000 85977065
so as expected, the Dhrystone run with a larger input parameter 100000
took more cycles than the ones with smaller input parameters.
The gem5-stats
commands output the approximate number of CPU cycles it took Dhrystone to run.
A more naive and simpler to understand approach would be a direct:
./run --arch aarch64 --emulator gem5 --eval 'm5 checkpoint;m5 resetstats;dhrystone 10000;m5 exit'
but the problem is that this method does not allow to easily run a different script without running the boot again. The ./gem5.sh
script works around that by using m5 readfile as explained further at: Section 24.6.3, “gem5 checkpoint restore and run a different script”.
Now you can play a fun little game with your friends:
-
pick a computational problem
-
make a program that solves the computation problem, and outputs output to stdout
-
write the code that runs the correct computation in the smallest number of cycles possible
Interesting algorithms and benchmarks for this game are being collected at:
To find out why your program is slow, a good first step is to have a look at the gem5 m5out/stats.txt file.