24.6.4. gem5 restore checkpoint with a different CPU

gem5 can switch to a different CPU model when restoring a checkpoint.

A common combo is to boot Linux with a fast CPU, make a checkpoint and then replay the benchmark of interest with a slower CPU.

This can be observed interactively in full system with:

./run --arch aarch64 --emulator gem5

Then in the guest terminal after boot ends:

sh -c 'm5 checkpoint;sh'
m5 exit

And then restore the checkpoint with a different slower CPU:

./run --arch arm --emulator gem5 --gem5-restore 1 -- --caches --cpu-type=DerivO3CPU

And now you will notice that everything happens much slower in the guest terminal!

One even more direct and minimal way to observe this is with userland/freestanding/gem5_checkpoint.S which was mentioned at gem5 checkpoint userland minimal example plus some logging:

./run \
  --arch aarch64 \
  --emulator gem5 \
  --static \
  --trace ExecAll,FmtFlag,O3CPU,SimpleCPU \
  --userland userland/freestanding/gem5_checkpoint.S \
;
cat "$(./getvar --arch aarch64 --emulator gem5 trace_txt_file)"
./run \
  --arch aarch64 \
  --emulator gem5 \
  --gem5-restore 1 \
  --static \
  --trace ExecAll,FmtFlag,O3CPU,SimpleCPU \
  --userland userland/freestanding/gem5_checkpoint.S \
  -- \
  --caches \
  --cpu-type DerivO3CPU \
  --restore-with-cpu DerivO3CPU \
;
cat "$(./getvar --arch aarch64 --emulator gem5 trace_txt_file)"

At gem5 2235168b72537535d74c645a70a85479801e0651, the first run does everything in AtomicSimpleCPU:

...
      0: SimpleCPU: system.cpu.dcache_port: received snoop pkt for addr:0x1f92 WriteReq
      0: SimpleCPU: system.cpu.dcache_port: received snoop pkt for addr:0x1e40 WriteReq
      0: SimpleCPU: system.cpu.dcache_port: received snoop pkt for addr:0x1e30 WriteReq
      0: SimpleCPU: system.cpu: Tick
      0: ExecEnable: system.cpu: A0 T0 : @asm_main_after_prologue    :   movz   x0, #0, #0        : IntAlu :  D=0x0000000000000000  flags=(IsInteger)
    500: SimpleCPU: system.cpu: Tick
    500: ExecEnable: system.cpu: A0 T0 : @asm_main_after_prologue+4    :   movz   x1, #0, #0        : IntAlu :  D=0x0000000000000000  flags=(IsInteger)
   1000: SimpleCPU: system.cpu: Tick
   1000: ExecEnable: system.cpu: A0 T0 : @asm_main_after_prologue+8    :   m5checkpoint             : IntAlu :   flags=(IsInteger|IsNonSpeculative|IsUnverifiable)
   1000: SimpleCPU: system.cpu: Resume
   1500: SimpleCPU: system.cpu: Tick
   1500: ExecEnable: system.cpu: A0 T0 : @asm_main_after_prologue+12    :   movz   x0, #0, #0        : IntAlu :  D=0x0000000000000000  flags=(IsInteger)
   2000: SimpleCPU: system.cpu: Tick
   2000: ExecEnable: system.cpu: A0 T0 : @asm_main_after_prologue+16    :   m5exit                   : No_OpClass :   flags=(IsInteger|IsNonSpeculative)

and after restore we see as expected a single ExecEnable instruction executed amidst O3CPU noise:

FullO3CPU: Ticking main, FullO3CPU.
  79000: ExecEnable: system.cpu: A0 T0 : @asm_main_after_prologue+12    :   movz   x0, #0, #0        : IntAlu :  D=0x0000000000000000  FetchSeq=1  CPSeq=1  flags=(IsInteger)
  82500: O3CPU: system.cpu: Removing committed instruction [tid:0] PC (0x400084=>0x400088).(0=>1) [sn:1]
  82500: O3CPU: system.cpu: Removing instruction, [tid:0] [sn:1] PC (0x400084=>0x400088).(0=>1)
  82500: O3CPU: system.cpu: Scheduling next tick!
  83000: O3CPU: system.cpu:

which is the movz after the checkpoint. The final m5exit does not appear due to DerivO3CPU logging insanity.

Bibliography: