27.3.4.1.1. Detailed gem5 analysis of how data races happen
The smallest data race we managed to come up as of LKMC 7c01b29f1ee7da878c7cc9cb4565f3f3cf516a92 and gem5 872cb227fdc0b4d60acc7840889d567a6936b6e1 was with userland/c/atomic.c (see also C multithreading):
./run \ --arch aarch64 \ --cli-args '2 10' \ --cpus 3 \ --emulator gem5 \ --userland userland/c/atomic.c \ ;
which outputs:
atomic 20 non-atomic 19
Note that that the system is very minimal, and doesn’t even have caches, so I’m curious as to how this can happen at all.
So first we do a run with --trace Exec
and look at the my_thread_main
entries.
From there we see that first CPU1 enters the function, since it was spawned first.
Then for some time, both CPU1 and CPU2 are running at the same time.
Finally, CPU1 exists, then CPU2 runs alone for a while to finish its loops, and then CPU2 exits.
By greping the LDR data read from the log, we are able to easily spot the moment where things started to go wrong based on the D=
data:
grep -E 'my_thread_main\+36' trace.txt > trace-ldr.txt
The grep
output contains
94024500: system.cpu1: A0 T0 : @my_thread_main+36 : ldr x0, [x0] : MemRead : D=0x0000000000000006 A=0x411060 flags=(IsInteger|IsMemRef|IsLoad) 94036500: system.cpu1: A0 T0 : @my_thread_main+36 : ldr x0, [x0] : MemRead : D=0x0000000000000007 A=0x411060 flags=(IsInteger|IsMemRef|IsLoad) 94048500: system.cpu1: A0 T0 : @my_thread_main+36 : ldr x0, [x0] : MemRead : D=0x0000000000000008 A=0x411060 flags=(IsInteger|IsMemRef|IsLoad) 94058500: system.cpu2: A0 T0 : @my_thread_main+36 : ldr x0, [x0] : MemRead : D=0x0000000000000009 A=0x411060 flags=(IsInteger|IsMemRef|IsLoad) 94060500: system.cpu1: A0 T0 : @my_thread_main+36 : ldr x0, [x0] : MemRead : D=0x0000000000000009 A=0x411060 flags=(IsInteger|IsMemRef|IsLoad) 94070500: system.cpu2: A0 T0 : @my_thread_main+36 : ldr x0, [x0] : MemRead : D=0x000000000000000a A=0x411060 flags=(IsInteger|IsMemRef|IsLoad) 94082500: system.cpu2: A0 T0 : @my_thread_main+36 : ldr x0, [x0] : MemRead : D=0x000000000000000b A=0x411060 flags=(IsInteger|IsMemRef|IsLoad)
and so se see that it is at 94058500
that things started going bad, since two consecutive loads from different CPUs read the same value D=9
! Actually, things were not too bad afterwards because this was by coincidence the last CPU1 read, we would have missed many more increments if the number of iterations had been larger.
Now that we have the first bad time, let’s look at the fuller disassembly to better understand what happens around that point.
94058500: system.cpu2: A0 T0 : @my_thread_main+36 : ldr x0, [x0] : MemRead : D=0x0000000000000009 A=0x411060 flags=(IsInteger|IsMemRef|IsLoad) 94059000: system.cpu2: A0 T0 : @my_thread_main+40 : add w1, w0, #1 : IntAlu : D=0x000000000000000a flags=(IsInteger) 94059000: system.cpu1: A0 T0 : @my_thread_main+120 : b.cc <my_thread_main+28> : IntAlu : flags=(IsControl|IsDirectControl|IsCondControl) 94059500: system.cpu1: A0 T0 : @my_thread_main+28 : adrp x0, #69632 : IntAlu : D=0x0000000000411000 flags=(IsInteger) 94059500: system.cpu2: A0 T0 : @my_thread_main+44 : adrp x0, #69632 : IntAlu : D=0x0000000000411000 flags=(IsInteger) 94060000: system.cpu2: A0 T0 : @my_thread_main+48 : add x0, x0, #96 : IntAlu : D=0x0000000000411060 flags=(IsInteger) 94060000: system.cpu1: A0 T0 : @my_thread_main+32 : add x0, x0, #96 : IntAlu : D=0x0000000000411060 flags=(IsInteger) 94060500: system.cpu1: A0 T0 : @my_thread_main+36 : ldr x0, [x0] : MemRead : D=0x0000000000000009 A=0x411060 flags=(IsInteger|IsMemRef|IsLoad) 94060500: system.cpu2: A0 T0 : @my_thread_main+52 : str x1, [x0] : MemWrite : D=0x000000000000000a A=0x411060 flags=(IsInteger|IsMemRef|IsStore)
and from this, all becomes crystal clear:
-
94058500: CPU2 loads
-
94060500: CPU1 loads
-
94060500: CPU2 stores
so we see that CPU2 just happened to store after CPU1 loads.
We also understand why LDADD solves the race problem in AtomicSimpleCPU: it does the load and store in one single go!