24.22.4.7. gem5 event queue MinorCPU syscall emulation freestanding example analysis

The events for the Atomic CPU were pretty simple: basically just ticks.

But as we venture into more complex CPU models such as MinorCPU, the events get much more complex and interesting.

The memory system system part must be similar to that of TimingSimpleCPU that we previously studied gem5 event queue TimingSimpleCPU syscall emulation freestanding example analysis: the main thing we want to see is how the CPU pipeline speeds up execution by preventing some memory stalls.

The config.dot.svg also indicates that: everything is exactly as in gem5 event queue TimingSimpleCPU syscall emulation freestanding example analysis with caches, except that the CPU is a MinorCPU instead of TimingSimpleCPU, and the --caches are now mandatory:

./run \
  --arch aarch64 \
  --emulator gem5 \
  --userland userland/arch/aarch64/freestanding/linux/hello.S \
  --trace FmtFlag,Cache,Event,ExecAll,Minor \
  --trace-stdout \
  -- \
  --cpu-type MinorCPU \
  --caches \
;

and here’s a handy link to the source: userland/arch/aarch64/freestanding/linux/hello.S.

On LKMC ce3ea9faea95daf46dea80d4236a30a0891c3ca5 gem5 872cb227fdc0b4d60acc7840889d567a6936b6e1 we see the following.

First there is a missed instruction fetch for the initial entry address which we know from gem5 event queue TimingSimpleCPU syscall emulation freestanding example analysis with caches is the virtual address 0x400078 which maps to physical 0x78:

    500: Cache: system.cpu.icache: access for ReadReq [40:7f] IF miss

The memory request comes back later on at:

  77000: Cache: system.cpu.icache: recvTimingResp: Handling response ReadResp [40:7f] IF

and soon after the CPU also ifetches across the barrier:

  79000: Cache: system.cpu.icache: access for ReadReq [80:bf] IF miss

TODO why? We have 0x78 and 0x7c, and those should be it since we are dual issue, right? Is this prefetching at work?

Later on we see the first instruction, our MOVZ, was decoded:

  80000: MinorExecute: system.cpu.execute: Trying to issue inst: 0/1.1/1/1.1 pc: 0x400078 (movz) to FU: 0

and that issue succeeds, because the functional unit 0 (FU 0) is an IntAlu as shown at gem5 functional units:

  80000: MinorExecute: system.cpu.execute: Issuing inst: 0/1.1/1/1.1 pc: 0x400078 (movz) into FU 0

At the very same tick, the second instruction is also decoded, our ADR:

  80000: MinorExecute: system.cpu.execute: Trying to issue inst: 0/1.1/1/2.2 pc: 0x40007c (adr) to FU: 0
  80000: MinorExecute: system.cpu.execute: Can't issue as FU: 0 is already busy
  80000: MinorExecute: system.cpu.execute: Trying to issue inst: 0/1.1/1/2.2 pc: 0x40007c (adr) to FU: 1
  80000: MinorExecute: system.cpu.execute: Issuing inst: 0/1.1/1/2.2 pc: 0x40007c (adr) into FU 1

This is also an IntAlu instruction, and it can’t run on FU 0 because the first instruction is already running there. But to our luck, FU 1 is also an IntAlu unit, and so it runs there.

Crap, those Minor logs should say what OpClass each instruction is, that would make things clearer.

TODO what is that 0/1.1/1/1.1 notation that shows up everywhere? Must be important, let’s look at the source.

Soon after (3 ticks later, so guessing due to opLat=3?), the execution appears to be over already since we see the ExecAll come through, which generally happens at the very end:

  81500: MinorExecute: system.cpu.execute: Attempting to commit [tid:0]
  81500: MinorExecute: system.cpu.execute: Committing micro-ops for interrupt[tid:0]
  81500: MinorExecute: system.cpu.execute: Trying to commit canCommitInsts: 1
  81500: MinorExecute: system.cpu.execute: Trying to commit from FUs
  81500: MinorExecute: global: ExecContext setting PC: (0x400078=>0x40007c).(0=>1)
  81500: MinorExecute: system.cpu.execute: Committing inst: 0/1.1/1/1.1 pc: 0x400078 (movz)
  81500: MinorExecute: system.cpu.execute: Unstalling 0 for inst 0/1.1/1/1.1
  81500: MinorExecute: system.cpu.execute: Completed inst: 0/1.1/1/1.1 pc: 0x400078 (movz)
  81500: MinorScoreboard: system.cpu.execute.scoreboard0: Clearing inst: 0/1.1/1/1.1 pc: 0x400078 (movz) regIndex: 0 final numResults: 0
  81500: ExecEnable: system.cpu: A0 T0 : @asm_main_after_prologue    :   movz   x0, #1, #0        : IntAlu :  D=0x0000000000000001  FetchSeq=1  CPSeq=1  flags=(IsInteger)
  81500: MinorExecute: system.cpu.execute: Trying to commit canCommitInsts: 1
  81500: MinorExecute: system.cpu.execute: Trying to commit from FUs
  81500: MinorExecute: global: ExecContext setting PC: (0x40007c=>0x400080).(0=>1)
  81500: MinorExecute: system.cpu.execute: Committing inst: 0/1.1/1/2.2 pc: 0x40007c (adr)
  81500: MinorExecute: system.cpu.execute: Unstalling 1 for inst 0/1.1/1/2.2
  81500: MinorExecute: system.cpu.execute: Completed inst: 0/1.1/1/2.2 pc: 0x40007c (adr)
  81500: MinorScoreboard: system.cpu.execute.scoreboard0: Clearing inst: 0/1.1/1/2.2 pc: 0x40007c (adr) regIndex: 1 final numResults: 0
  81500: MinorExecute: system.cpu.execute: Reached inst commit limit
  81500: ExecEnable: system.cpu: A0 T0 : @asm_main_after_prologue+4    :   adr   x1, #28            : IntAlu :  D=0x0000000000400098  FetchSeq=2  CPSeq=2  flags=(IsInteger)

The ifetch for the third instruction returns at:

 129000: Cache: system.cpu.icache: recvTimingResp: Handling response ReadResp [80:bf] IF

so now we are ready to run the third and fourth instructions of the program:

    ldr x2, =len
    mov x8, 64

The LDR goes all the way down to FU 6 which is the memory one:

 132000: MinorExecute: system.cpu.execute: Trying to issue inst: 0/1.1/2/3.3 pc: 0x400080 (ldr) to FU: 0
 132000: MinorExecute: system.cpu.execute: Can't issue as FU: 0 isn't capable
 132000: MinorExecute: system.cpu.execute: Trying to issue inst: 0/1.1/2/3.3 pc: 0x400080 (ldr) to FU: 1
 132000: MinorExecute: system.cpu.execute: Can't issue as FU: 1 isn't capable
 132000: MinorExecute: system.cpu.execute: Trying to issue inst: 0/1.1/2/3.3 pc: 0x400080 (ldr) to FU: 2
 132000: MinorExecute: system.cpu.execute: Can't issue as FU: 2 isn't capable
 132000: MinorExecute: system.cpu.execute: Trying to issue inst: 0/1.1/2/3.3 pc: 0x400080 (ldr) to FU: 3
 132000: MinorExecute: system.cpu.execute: Can't issue as FU: 3 isn't capable
 132000: MinorExecute: system.cpu.execute: Trying to issue inst: 0/1.1/2/3.3 pc: 0x400080 (ldr) to FU: 4
 132000: MinorExecute: system.cpu.execute: Can't issue as FU: 4 isn't capable
 132000: MinorExecute: system.cpu.execute: Trying to issue inst: 0/1.1/2/3.3 pc: 0x400080 (ldr) to FU: 5
 132000: MinorExecute: system.cpu.execute: Can't issue as FU: 5 isn't capable
 132000: MinorExecute: system.cpu.execute: Trying to issue inst: 0/1.1/2/3.3 pc: 0x400080 (ldr) to FU: 6
 132000: MinorExecute: system.cpu.execute: Issuing inst: 0/1.1/2/3.3 pc: 0x400080 (ldr) into FU 6

and then the MOV issue follows soon afterwards (TODO why not at the same time like for the previous pair?):

 132500: MinorExecute: system.cpu.execute: Trying to issue inst: 0/1.1/2/4.4 pc: 0x400084 (movz) to FU: 0
 132500: MinorExecute: system.cpu.execute: Issuing inst: 0/1.1/2/4.4 pc: 0x400084 (movz) into FU 0