24.6.4.1. gem5 fast forward

Besides switching CPUs after a checkpoint restore, fs.py also has the --fast-forward option to automatically run the script from the start on a less detailed CPU, and switch to a more detailed CPU at a given tick.

This is generally useless compared to checkpoint restoring because:

  • checkpoint restore allows to run multiple contents after the restore, and restoring to multiple different system states, which you almost always want to do

  • we generally don’t know the exact tick at which the region of interest will start, especially as the binaries change. It is much easier to just instrument the content with a checkoint m5op

But let’s give it a try anyway with userland/freestanding/gem5_checkpoint.S which was mentioned at gem5 checkpoint userland minimal example

./run \
  --arch aarch64 \
  --emulator gem5 \
  --static \
  --trace ExecAll,FmtFlag,O3CPU,SimpleCPU \
  --userland userland/freestanding/gem5_checkpoint.S \
  -- \
  --caches
  --cpu-type DerivO3CPU \
  --fast-forward 1000 \
;
cat "$(./getvar --arch aarch64 --emulator gem5 trace_txt_file)"

At gem5 2235168b72537535d74c645a70a85479801e0651 we see something like:

      0: O3CPU: system.switch_cpus: Creating O3CPU object.
      0: O3CPU: system.switch_cpus: Workload[0] process is 0      0: SimpleCPU: system.cpu: ActivateContext 0
      0: SimpleCPU: system.cpu.dcache_port: received snoop pkt for addr:0 WriteReq
      0: SimpleCPU: system.cpu.dcache_port: received snoop pkt for addr:0x40 WriteReq
...

      0: SimpleCPU: system.cpu.dcache_port: received snoop pkt for addr:0x1f92 WriteReq
      0: SimpleCPU: system.cpu.dcache_port: received snoop pkt for addr:0x1e40 WriteReq
      0: SimpleCPU: system.cpu.dcache_port: received snoop pkt for addr:0x1e30 WriteReq
      0: SimpleCPU: system.cpu: Tick
      0: ExecEnable: system.cpu: A0 T0 : @asm_main_after_prologue    :   movz   x0, #0, #0        : IntAlu :  D=0x0000000000000000  flags=(IsInteger)
    500: SimpleCPU: system.cpu: Tick
    500: ExecEnable: system.cpu: A0 T0 : @asm_main_after_prologue+4    :   movz   x1, #0, #0        : IntAlu :  D=0x0000000000000000  flags=(IsInteger)
   1000: SimpleCPU: system.cpu: Tick
   1000: ExecEnable: system.cpu: A0 T0 : @asm_main_after_prologue+8    :   m5checkpoint             : IntAlu :   flags=(IsInteger|IsNonSpeculative|IsUnverifiable)
   1000: O3CPU: system.switch_cpus: [tid:0] Calling activate thread.
   1000: O3CPU: system.switch_cpus: [tid:0] Adding to active threads list
   1500: O3CPU: system.switch_cpus:

FullO3CPU: Ticking main, FullO3CPU.
   1500: O3CPU: system.switch_cpus: Scheduling next tick!
   2000: O3CPU: system.switch_cpus:

FullO3CPU: Ticking main, FullO3CPU.
   2000: O3CPU: system.switch_cpus: Scheduling next tick!
   2500: O3CPU: system.switch_cpus:

...

FullO3CPU: Ticking main, FullO3CPU.
  44500: ExecEnable: system.switch_cpus: A0 T0 : @asm_main_after_prologue+12    :   movz   x0, #0, #0        : IntAlu :  D=0x00000000000
  48000: O3CPU: system.switch_cpus: Removing committed instruction [tid:0] PC (0x400084=>0x400088).(0=>1) [sn:1]
  48000: O3CPU: system.switch_cpus: Removing instruction, [tid:0] [sn:1] PC (0x400084=>0x400088).(0=>1)
  48000: O3CPU: system.switch_cpus: Scheduling next tick!
  48500: O3CPU: system.switch_cpus:

...

We can also compare that to the same log but without --fast-forward and other CPU switch options:

      0: SimpleCPU: system.cpu.dcache_port: received snoop pkt for addr:0x1e40 WriteReq
      0: SimpleCPU: system.cpu.dcache_port: received snoop pkt for addr:0x1e30 WriteReq
      0: SimpleCPU: system.cpu: Tick
      0: ExecEnable: system.cpu: A0 T0 : @asm_main_after_prologue    :   movz   x0, #0, #0        : IntAlu :  D=0x0000000000000000  flags=(IsInteger)
    500: SimpleCPU: system.cpu: Tick
    500: ExecEnable: system.cpu: A0 T0 : @asm_main_after_prologue+4    :   movz   x1, #0, #0        : IntAlu :  D=0x0000000000000000  flags=(IsInteger)
   1000: SimpleCPU: system.cpu: Tick
   1000: ExecEnable: system.cpu: A0 T0 : @asm_main_after_prologue+8    :   m5checkpoint             : IntAlu :   flags=(IsInteger|IsNonSpeculative|IsUnverifiable)
   1000: SimpleCPU: system.cpu: Resume
   1500: SimpleCPU: system.cpu: Tick
   1500: ExecEnable: system.cpu: A0 T0 : @asm_main_after_prologue+12    :   movz   x0, #0, #0        : IntAlu :  D=0x0000000000000000  flags=(IsInteger)
   2000: SimpleCPU: system.cpu: Tick
   2000: ExecEnable: system.cpu: A0 T0 : @asm_main_after_prologue+16    :   m5exit                   : No_OpClass :   flags=(IsInteger|IsNonSpeculative)

Therefore, it is clear that what we wanted happen:

  • up until the tick 1000, SimpleCPU was ticking

  • after tick 1000, cpu O3CPU started ticking

Bibliography: