24.3.2. gem5 cache size
A quick ./run --emulator gem5 -- -h
leads us to the options:
--caches --l1d_size=1024 --l1i_size=1024 --l2cache --l2_size=1024 --l3_size=1024
But keep in mind that it only affects benchmark performance of the most detailed CPU types as shown at: Table 2, “gem5 cache support in function of CPU type”.
arch | CPU type | caches used |
---|---|---|
X86 |
|
no |
X86 |
|
?* |
ARM |
|
no |
ARM |
|
yes |
*: couldn’t test because of:
Cache sizes can in theory be checked with the methods described at: https://superuser.com/questions/55776/finding-l2-cache-size-in-linux:
lscpu cat /sys/devices/system/cpu/cpu0/cache/index2/size
and on Ubuntu 20.04 host but not Buildroot 1.31.1:
getconf -a | grep CACHE
and we also have an easy to use userland executable using sysconf at userland/linux/sysconf.c:
./run --emulator gem5 --userland userland/linux/sysconf.c
but for some reason the Linux kernel is not seeing the cache sizes:
Behaviour breakdown:
-
arm QEMU and gem5 (both
AtomicSimpleCPU
orHPI
), x86 gem5:/sys
files don’t exist, andgetconf
andlscpu
value empty -
x86 QEMU:
/sys
files exist, butgetconf
andlscpu
values still empty
The only precise option is therefore to look at gem5 config.ini as done at: gem5 event queue TimingSimpleCPU syscall emulation freestanding example analysis with caches.
Or for a quick and dirty performance measurement approach instead:
./gem5-bench-cache -- --arch aarch64 cat "$(./getvar --arch aarch64 run_dir)/bench-cache.txt"
which gives:
cmd ./run --emulator gem5 --arch aarch64 --gem5-readfile "dhrystone 1000" --gem5-restore 1 -- --caches --l2cache --l1d_size=1024 --l1i_size=1024 --l2_size=1024 --l3_size=1024 --cpu-type=HPI --restore-with-cpu=HPI time 23.82 exit_status 0 cycles 93284622 instructions 4393457 cmd ./run --emulator gem5 --arch aarch64 --gem5-readfile "dhrystone 1000" --gem5-restore 1 -- --caches --l2cache --l1d_size=1024kB --l1i_size=1024kB --l2_size=1024kB --l3_size=1024kB --cpu-type=HPI --restore-with-cpu=HPI time 14.91 exit_status 0 cycles 10128985 instructions 4211458 cmd ./run --emulator gem5 --arch aarch64 --gem5-readfile "dhrystone 10000" --gem5-restore 1 -- --caches --l2cache --l1d_size=1024 --l1i_size=1024 --l2_size=1024 --l3_size=1024 --cpu-type=HPI --restore-with-cpu=HPI time 51.87 exit_status 0 cycles 188803630 instructions 12401336 cmd ./run --emulator gem5 --arch aarch64 --gem5-readfile "dhrystone 10000" --gem5-restore 1 -- --caches --l2cache --l1d_size=1024kB --l1i_size=1024kB --l2_size=1024kB --l3_size=1024kB --cpu-type=HPI --restore-with-cpu=HPI time 35.35 exit_status 0 cycles 20715757 instructions 12192527 cmd ./run --emulator gem5 --arch aarch64 --gem5-readfile "dhrystone 100000" --gem5-restore 1 -- --caches --l2cache --l1d_size=1024 --l1i_size=1024 --l2_size=1024 --l3_size=1024 --cpu-type=HPI --restore-with-cpu=HPI time 339.07 exit_status 0 cycles 1176559936 instructions 94222791 cmd ./run --emulator gem5 --arch aarch64 --gem5-readfile "dhrystone 100000" --gem5-restore 1 -- --caches --l2cache --l1d_size=1024kB --l1i_size=1024kB --l2_size=1024kB --l3_size=1024kB --cpu-type=HPI --restore-with-cpu=HPI time 240.37 exit_status 0 cycles 125666679 instructions 91738770
We make the following conclusions:
-
the number of instructions almost does not change: the CPU is waiting for memory all the extra time. TODO: why does it change at all?
-
the wall clock execution time is not directionally proportional to the number of cycles: here we had a 10x cycle increase, but only 2x time increase. This suggests that the simulation of cycles in which the CPU is waiting for memory to come back is faster.