11.7.4. gem5 syscall emulation multithreading

gem5 user mode multithreading has been particularly flaky compared to QEMU’s, but work is being put into improving it.

In gem5 syscall simulation, the fork syscall checks if there is a free CPU, and if there is a free one, the new threads runs on that CPU.

Otherwise, the fork call, and therefore higher level interfaces to fork such as pthread_create also fail and return a failure return status in the guest.

For example, if we use just one CPU for userland/posix/pthread_self.c which spawns one thread besides main:

./run --cpus 1 --emulator gem5 --userland userland/posix/pthread_self.c --cli-args 1

fails with this error message coming from the guest stderr:

pthread_create: Resource temporarily unavailable

It works however if we add on extra CPU:

./run --cpus 2 --emulator gem5 --userland userland/posix/pthread_self.c --cli-args 1

Once threads exit, their CPU is freed and becomes available for new fork calls: For example, the following run spawns a thread, joins it, and then spawns again, and 2 CPUs are enough:

./run --cpus 2 --emulator gem5 --userland userland/posix/pthread_self.c --cli-args '1 2'

because at each point in time, only up to two threads are running.

gem5 syscall emulation does show the expected number of cores when queried, e.g.:

./run --cpus 1 --userland userland/cpp/thread_hardware_concurrency.cpp --emulator gem5
./run --cpus 2 --userland userland/cpp/thread_hardware_concurrency.cpp --emulator gem5

outputs 1 and 2 respectively.

This can also be clearly by running sched_getcpu:

./run \
  --arch aarch64 \
  --cli-args  4 \
  --cpus 8 \
  --emulator gem5 \
  --userland userland/linux/sched_getcpu.c \
;

which necessarily produces an output containing the CPU numbers from 1 to 4 and no higher:

1
3
4
2

TODO why does the 2 come at the end here? Would be good to do a detailed assembly run analysis.