24.22.4.2. gem5 event queue TimingSimpleCPU syscall emulation freestanding example analysis

Now, let’s move on to TimingSimpleCPU, which is just like AtomicSimpleCPU internally, but now the memory requests don’t actually finish immediately: gem5 CPU types!

This means that simulation will be much more accurate, and the DRAM memory will be modelled.

TODO: analyze better what each of the memory event mean. For now, we have just collected a bunch of data there, but needs interpreting. The CPU specifics in this section are already insightful however.

TimingSimpleCPU should be the second simplest CPU to analyze, so let’s give it a try:

./run \
  --arch aarch64 \
  --emulator gem5 \
  --userland userland/arch/aarch64/freestanding/linux/hello.S \
  --trace Event,ExecAll,FmtFlag \
  --trace-stdout \
  -- \
  --cpu-type TimingSimpleCPU \
;

As of LKMC 78ce2dabe18ef1d87dc435e5bc9369ce82e8d6d2 gem5 12c917de54145d2d50260035ba7fa614e25317a3 the log is now much more complex.

Here is an abridged version with:

  • the beginning up to the second instruction

  • end ending

because all that happens in between is exactly the same as the first two instructions and therefore boring.

We have also manually added:

  • double newlines before each event execution

  • line IDs to be able to refer to specific events more easily (#0, #1, etc.)

#0       0: Event: system.cpu.wrapped_function_event: EventFunctionWrapped 43 scheduled @ 0
**** REAL SIMULATION ****
#1       0: Event: system.mem_ctrls_0.wrapped_function_event: EventFunctionWrapped 14 scheduled @ 7786250
#2       0: Event: system.mem_ctrls_1.wrapped_function_event: EventFunctionWrapped 20 scheduled @ 7786250
#3       0: Event: Event_74: generic 74 scheduled @ 0
info: Entering event queue @ 0.  Starting simulation...
#4       0: Event: Event_74: generic 74 rescheduled @ 18446744073709551615

#5       0: Event: system.cpu.wrapped_function_event: EventFunctionWrapped 43 executed @ 0
#6       0: Event: system.mem_ctrls.wrapped_function_event: EventFunctionWrapped 9 scheduled @ 0
#7       0: Event: system.membus.reqLayer0.wrapped_function_event: EventFunctionWrapped 60 scheduled @ 1000

#8       0: Event: system.mem_ctrls.wrapped_function_event: EventFunctionWrapped 9 executed @ 0
#9       0: Event: system.mem_ctrls_0.wrapped_function_event: EventFunctionWrapped 12 scheduled @ 0
#10      0: Event: system.mem_ctrls.wrapped_function_event: EventFunctionWrapped 10 scheduled @ 46250
#11      0: Event: system.mem_ctrls.wrapped_function_event: EventFunctionWrapped 9 scheduled @ 5000

#12      0: Event: system.mem_ctrls_0.wrapped_function_event: EventFunctionWrapped 12 executed @ 0
#13      0: Event: system.mem_ctrls_0.wrapped_function_event: EventFunctionWrapped 15 scheduled @ 0

#14      0: Event: system.mem_ctrls_0.wrapped_function_event: EventFunctionWrapped 15 executed @ 0

#15   1000: Event: system.membus.reqLayer0.wrapped_function_event: EventFunctionWrapped 60 executed @ 1000

#16   5000: Event: system.mem_ctrls.wrapped_function_event: EventFunctionWrapped 9 executed @ 5000

#17  46250: Event: system.mem_ctrls.wrapped_function_event: EventFunctionWrapped 10 executed @ 46250
#18  46250: Event: system.mem_ctrls.port-RespPacketQueue.wrapped_function_event: EventFunctionWrapped 8 scheduled @ 74250

#19  74250: Event: system.mem_ctrls.port-RespPacketQueue.wrapped_function_event: EventFunctionWrapped 8 executed @ 74250
#20  74250: Event: system.membus.slave[1]-RespPacketQueue.wrapped_function_event: EventFunctionWrapped 64 scheduled @ 77000
#21  74250: Event: system.membus.respLayer1.wrapped_function_event: EventFunctionWrapped 65 scheduled @ 77000

#22  77000: Event: system.membus.respLayer1.wrapped_function_event: EventFunctionWrapped 65 executed @ 77000

#23  77000: Event: system.membus.slave[1]-RespPacketQueue.wrapped_function_event: EventFunctionWrapped 64 executed @ 77000
#24  77000: Event: Event_40: Timing CPU icache tick 40 scheduled @ 77000

#25  77000: Event: Event_40: Timing CPU icache tick 40 executed @ 77000
     77000: ExecEnable: system.cpu: A0 T0 : @asm_main_after_prologue    :   movz   x0, #1, #0        : IntAlu :  D=0x0000000000000001  flags=(IsInteger)
#26  77000: Event: system.mem_ctrls.wrapped_function_event: EventFunctionWrapped 9 scheduled @ 77000
#27  77000: Event: system.membus.reqLayer0.wrapped_function_event: EventFunctionWrapped 60 scheduled @ 78000

#28  77000: Event: system.mem_ctrls.wrapped_function_event: EventFunctionWrapped 9 executed @ 77000
#29  77000: Event: system.mem_ctrls.wrapped_function_event: EventFunctionWrapped 10 scheduled @ 95750
#30  77000: Event: system.mem_ctrls.wrapped_function_event: EventFunctionWrapped 9 scheduled @ 77000

#31  77000: Event: system.mem_ctrls.wrapped_function_event: EventFunctionWrapped 9 executed @ 77000

#32  78000: Event: system.membus.reqLayer0.wrapped_function_event: EventFunctionWrapped 60 executed @ 78000

#33  95750: Event: system.mem_ctrls.wrapped_function_event: EventFunctionWrapped 10 executed @ 95750
#34  95750: Event: system.mem_ctrls.port-RespPacketQueue.wrapped_function_event: EventFunctionWrapped 8 scheduled @ 123750

#35 123750: Event: system.mem_ctrls.port-RespPacketQueue.wrapped_function_event: EventFunctionWrapped 8 executed @ 123750
#36 123750: Event: system.membus.slave[1]-RespPacketQueue.wrapped_function_event: EventFunctionWrapped 64 scheduled @ 126000
#37 123750: Event: system.membus.respLayer1.wrapped_function_event: EventFunctionWrapped 65 scheduled @ 126000

#38 126000: Event: system.membus.respLayer1.wrapped_function_event: EventFunctionWrapped 65 executed @ 126000

#39 126000: Event: system.membus.slave[1]-RespPacketQueue.wrapped_function_event: EventFunctionWrapped 64 executed @ 126000
#40 126000: Event: Event_40: Timing CPU icache tick 40 scheduled @ 126000

#41 126000: Event: Event_40: Timing CPU icache tick 40 executed @ 126000
    126000: ExecEnable: system.cpu: A0 T0 : @asm_main_after_prologue+4    :   adr   x1, #28            : IntAlu :  D=0x0000000000400098  flags=(IsInteger)
#42 126000: Event: system.mem_ctrls.wrapped_function_event: EventFunctionWrapped 9 scheduled @ 126000
#43 126000: Event: system.membus.reqLayer0.wrapped_function_event: EventFunctionWrapped 60 scheduled @ 127000

    [...]

    469000: ExecEnable: system.cpu: A0 T0 : @asm_main_after_prologue+28    :   svc   #0x0               : IntAlu :   flags=(IsSerializeAfter|IsNonSpeculative|IsSyscall)
    469000: Event: Event_75: generic 75 scheduled @ 469000
    469000: Event: Event_75: generic 75 executed @ 469000

Looking into the generated config.dot.svg can give a better intuition on the shape of the memory system: Figure 3, “config.dot.svg for a TimingSimpleCPU without caches.”, so it is good to keep that in mind.

gem5 config TimingSimpleCPU 12c917de54145d2d50260035ba7fa614e25317a3
Figure 3. config.dot.svg for a TimingSimpleCPU without caches.

It is also helpful to see this as a tree of events where one execute event schedules other events:

    | | | | |
    0 1 2 3 4   0 TimingSimpleCPU::fetch
    5
    |
    +---+
    |   |
    6   7       6 DRAMCtrl::processNextReqEvent (0)
    8   15      7 BaseXBar::Layer::releaseLayer
    |
+---+---+
|   |   |
9   10  11      9 DRAMCtrl::Rank::processActivateEvent
12  17  16     10 DRAMCtrl::processRespondEvent (46.25)
|   |          11 DRAMCtrl::processNextReqEvent (5)
|   |
13  18         13 DRAMCtrl::Rank::processPowerEvent
14  19         18 PacketQueue::processSendEvent (28)
    |
    +---+
    |   |
    20  21     20 PacketQueue::processSendEvent  (2.75)
    23  22     21 BaseXBar::Layer<SrcType, DstType>::releaseLayer
    |
    24         24 TimingSimpleCPU::IcachePort::ITickEvent::process (0)
    25
    |
    +---+
    |   |
    26  27     26 DRAMCtrl::processNextReqEvent
    28  32     27 BaseXBar::Layer<SrcType, DstType>::releaseLayer
    |
    +---+
    |   |
    29  30     29 DRAMCtrl::processRespondEvent
    33  31     30 DRAMCtrl::processNextReqEvent
    |
    34         34 PacketQueue::processSendEvent
    35
    |
    +---+
    |   |
    36  37     36 PacketQueue::processSendEvent
    39  38     37 BaseXBar::Layer<SrcType, DstType>::releaseLayer
    |
    40         40 TimingSimpleCPU::IcachePort::ITickEvent::process
    41
    |
    +---+
    |   |
    42  43     42 DRAMCtrl::processNextReqEvent
               43 BaseXBar::Layer<SrcType, DstType>::releaseLayer

Note that every schedule is followed by an execution, so we put them together, for example:

    |   |
    6   7    6 DRAMCtrl::processNextReqEvent (0)
    8   15   7 BaseXBar::Layer::releaseLayer (0)
    |

means:

  • 6: schedule DRAMCtrl::processNextReqEvent to run in 0 ns after the execution that scheduled it

  • 8: execute DRAMCtrl::processNextReqEvent

  • 7: schedule BaseXBar::Layer::releaseLayer to run in 0 ns after the execution that scheduled it

  • 15: execute BaseXBar::Layer::releaseLayer

With this, we can focus on going up the event tree from an event of interest until we see what originally caused it!

Notably, the above tree contains the execution of the first two instructions.

Observe how the events leading up to the second instruction are basically a copy of those of the first one, this is the basic TimingSimpleCPU event loop in action.

One line summary of events:

  • #5: adds the request to the DRAM queue, and schedules a DRAMCtrl::processNextReqEvent which later sees that request immediately

  • #8: picks up the only request from the DRAM read queue (readQueue) and services that.

    If there were multiple requests, priority arbitration under DRAMCtrl::chooseNext could chose a different one than the first based on packet priorities

    This puts the request on the response queue respQueue and schedules another DRAMCtrl::processNextReqEvent but the request queue is empty, and that does nos schedule further events

  • #17: picks up the only request from the DRAM response queue and services that by placing it in yet another queue, and scheduling the PacketQueue::processSendEvent which will later pick up that packet

  • #19: picks up the request from the previous queue, and forwards it to another queue, and schedules yet another PacketQueue::processSendEvent

    The current one is the DRAM passing the message to the XBar, and the next processSendEvent is the XBar finally sending it back to the CPU

  • #23: the XBar port is actually sending the reply back.

    If knows to which CPU core to send the request to because ports keep a map of request to source:

    const auto route_lookup = routeTo.find(pkt->req);