24.22.4.2. gem5 event queue TimingSimpleCPU syscall emulation freestanding example analysis
Now, let’s move on to TimingSimpleCPU
, which is just like AtomicSimpleCPU
internally, but now the memory requests don’t actually finish immediately: gem5 CPU types!
This means that simulation will be much more accurate, and the DRAM memory will be modelled.
TODO: analyze better what each of the memory event mean. For now, we have just collected a bunch of data there, but needs interpreting. The CPU specifics in this section are already insightful however.
TimingSimpleCPU should be the second simplest CPU to analyze, so let’s give it a try:
./run \ --arch aarch64 \ --emulator gem5 \ --userland userland/arch/aarch64/freestanding/linux/hello.S \ --trace Event,ExecAll,FmtFlag \ --trace-stdout \ -- \ --cpu-type TimingSimpleCPU \ ;
As of LKMC 78ce2dabe18ef1d87dc435e5bc9369ce82e8d6d2 gem5 12c917de54145d2d50260035ba7fa614e25317a3 the log is now much more complex.
Here is an abridged version with:
-
the beginning up to the second instruction
-
end ending
because all that happens in between is exactly the same as the first two instructions and therefore boring.
We have also manually added:
-
double newlines before each event execution
-
line IDs to be able to refer to specific events more easily (
#0
,#1
, etc.)
#0 0: Event: system.cpu.wrapped_function_event: EventFunctionWrapped 43 scheduled @ 0 **** REAL SIMULATION **** #1 0: Event: system.mem_ctrls_0.wrapped_function_event: EventFunctionWrapped 14 scheduled @ 7786250 #2 0: Event: system.mem_ctrls_1.wrapped_function_event: EventFunctionWrapped 20 scheduled @ 7786250 #3 0: Event: Event_74: generic 74 scheduled @ 0 info: Entering event queue @ 0. Starting simulation... #4 0: Event: Event_74: generic 74 rescheduled @ 18446744073709551615 #5 0: Event: system.cpu.wrapped_function_event: EventFunctionWrapped 43 executed @ 0 #6 0: Event: system.mem_ctrls.wrapped_function_event: EventFunctionWrapped 9 scheduled @ 0 #7 0: Event: system.membus.reqLayer0.wrapped_function_event: EventFunctionWrapped 60 scheduled @ 1000 #8 0: Event: system.mem_ctrls.wrapped_function_event: EventFunctionWrapped 9 executed @ 0 #9 0: Event: system.mem_ctrls_0.wrapped_function_event: EventFunctionWrapped 12 scheduled @ 0 #10 0: Event: system.mem_ctrls.wrapped_function_event: EventFunctionWrapped 10 scheduled @ 46250 #11 0: Event: system.mem_ctrls.wrapped_function_event: EventFunctionWrapped 9 scheduled @ 5000 #12 0: Event: system.mem_ctrls_0.wrapped_function_event: EventFunctionWrapped 12 executed @ 0 #13 0: Event: system.mem_ctrls_0.wrapped_function_event: EventFunctionWrapped 15 scheduled @ 0 #14 0: Event: system.mem_ctrls_0.wrapped_function_event: EventFunctionWrapped 15 executed @ 0 #15 1000: Event: system.membus.reqLayer0.wrapped_function_event: EventFunctionWrapped 60 executed @ 1000 #16 5000: Event: system.mem_ctrls.wrapped_function_event: EventFunctionWrapped 9 executed @ 5000 #17 46250: Event: system.mem_ctrls.wrapped_function_event: EventFunctionWrapped 10 executed @ 46250 #18 46250: Event: system.mem_ctrls.port-RespPacketQueue.wrapped_function_event: EventFunctionWrapped 8 scheduled @ 74250 #19 74250: Event: system.mem_ctrls.port-RespPacketQueue.wrapped_function_event: EventFunctionWrapped 8 executed @ 74250 #20 74250: Event: system.membus.slave[1]-RespPacketQueue.wrapped_function_event: EventFunctionWrapped 64 scheduled @ 77000 #21 74250: Event: system.membus.respLayer1.wrapped_function_event: EventFunctionWrapped 65 scheduled @ 77000 #22 77000: Event: system.membus.respLayer1.wrapped_function_event: EventFunctionWrapped 65 executed @ 77000 #23 77000: Event: system.membus.slave[1]-RespPacketQueue.wrapped_function_event: EventFunctionWrapped 64 executed @ 77000 #24 77000: Event: Event_40: Timing CPU icache tick 40 scheduled @ 77000 #25 77000: Event: Event_40: Timing CPU icache tick 40 executed @ 77000 77000: ExecEnable: system.cpu: A0 T0 : @asm_main_after_prologue : movz x0, #1, #0 : IntAlu : D=0x0000000000000001 flags=(IsInteger) #26 77000: Event: system.mem_ctrls.wrapped_function_event: EventFunctionWrapped 9 scheduled @ 77000 #27 77000: Event: system.membus.reqLayer0.wrapped_function_event: EventFunctionWrapped 60 scheduled @ 78000 #28 77000: Event: system.mem_ctrls.wrapped_function_event: EventFunctionWrapped 9 executed @ 77000 #29 77000: Event: system.mem_ctrls.wrapped_function_event: EventFunctionWrapped 10 scheduled @ 95750 #30 77000: Event: system.mem_ctrls.wrapped_function_event: EventFunctionWrapped 9 scheduled @ 77000 #31 77000: Event: system.mem_ctrls.wrapped_function_event: EventFunctionWrapped 9 executed @ 77000 #32 78000: Event: system.membus.reqLayer0.wrapped_function_event: EventFunctionWrapped 60 executed @ 78000 #33 95750: Event: system.mem_ctrls.wrapped_function_event: EventFunctionWrapped 10 executed @ 95750 #34 95750: Event: system.mem_ctrls.port-RespPacketQueue.wrapped_function_event: EventFunctionWrapped 8 scheduled @ 123750 #35 123750: Event: system.mem_ctrls.port-RespPacketQueue.wrapped_function_event: EventFunctionWrapped 8 executed @ 123750 #36 123750: Event: system.membus.slave[1]-RespPacketQueue.wrapped_function_event: EventFunctionWrapped 64 scheduled @ 126000 #37 123750: Event: system.membus.respLayer1.wrapped_function_event: EventFunctionWrapped 65 scheduled @ 126000 #38 126000: Event: system.membus.respLayer1.wrapped_function_event: EventFunctionWrapped 65 executed @ 126000 #39 126000: Event: system.membus.slave[1]-RespPacketQueue.wrapped_function_event: EventFunctionWrapped 64 executed @ 126000 #40 126000: Event: Event_40: Timing CPU icache tick 40 scheduled @ 126000 #41 126000: Event: Event_40: Timing CPU icache tick 40 executed @ 126000 126000: ExecEnable: system.cpu: A0 T0 : @asm_main_after_prologue+4 : adr x1, #28 : IntAlu : D=0x0000000000400098 flags=(IsInteger) #42 126000: Event: system.mem_ctrls.wrapped_function_event: EventFunctionWrapped 9 scheduled @ 126000 #43 126000: Event: system.membus.reqLayer0.wrapped_function_event: EventFunctionWrapped 60 scheduled @ 127000 [...] 469000: ExecEnable: system.cpu: A0 T0 : @asm_main_after_prologue+28 : svc #0x0 : IntAlu : flags=(IsSerializeAfter|IsNonSpeculative|IsSyscall) 469000: Event: Event_75: generic 75 scheduled @ 469000 469000: Event: Event_75: generic 75 executed @ 469000
Looking into the generated config.dot.svg
can give a better intuition on the shape of the memory system: Figure 3, “config.dot.svg
for a TimingSimpleCPU without caches.”, so it is good to keep that in mind.
config.dot.svg
for a TimingSimpleCPU without caches.It is also helpful to see this as a tree of events where one execute event schedules other events:
| | | | | 0 1 2 3 4 0 TimingSimpleCPU::fetch 5 | +---+ | | 6 7 6 DRAMCtrl::processNextReqEvent (0) 8 15 7 BaseXBar::Layer::releaseLayer | +---+---+ | | | 9 10 11 9 DRAMCtrl::Rank::processActivateEvent 12 17 16 10 DRAMCtrl::processRespondEvent (46.25) | | 11 DRAMCtrl::processNextReqEvent (5) | | 13 18 13 DRAMCtrl::Rank::processPowerEvent 14 19 18 PacketQueue::processSendEvent (28) | +---+ | | 20 21 20 PacketQueue::processSendEvent (2.75) 23 22 21 BaseXBar::Layer<SrcType, DstType>::releaseLayer | 24 24 TimingSimpleCPU::IcachePort::ITickEvent::process (0) 25 | +---+ | | 26 27 26 DRAMCtrl::processNextReqEvent 28 32 27 BaseXBar::Layer<SrcType, DstType>::releaseLayer | +---+ | | 29 30 29 DRAMCtrl::processRespondEvent 33 31 30 DRAMCtrl::processNextReqEvent | 34 34 PacketQueue::processSendEvent 35 | +---+ | | 36 37 36 PacketQueue::processSendEvent 39 38 37 BaseXBar::Layer<SrcType, DstType>::releaseLayer | 40 40 TimingSimpleCPU::IcachePort::ITickEvent::process 41 | +---+ | | 42 43 42 DRAMCtrl::processNextReqEvent 43 BaseXBar::Layer<SrcType, DstType>::releaseLayer
Note that every schedule is followed by an execution, so we put them together, for example:
| | 6 7 6 DRAMCtrl::processNextReqEvent (0) 8 15 7 BaseXBar::Layer::releaseLayer (0) |
means:
-
6
: scheduleDRAMCtrl::processNextReqEvent
to run in0
ns after the execution that scheduled it -
8
: executeDRAMCtrl::processNextReqEvent
-
7
: scheduleBaseXBar::Layer::releaseLayer
to run in0
ns after the execution that scheduled it -
15
: executeBaseXBar::Layer::releaseLayer
With this, we can focus on going up the event tree from an event of interest until we see what originally caused it!
Notably, the above tree contains the execution of the first two instructions.
Observe how the events leading up to the second instruction are basically a copy of those of the first one, this is the basic TimingSimpleCPU
event loop in action.
One line summary of events:
-
#5: adds the request to the DRAM queue, and schedules a
DRAMCtrl::processNextReqEvent
which later sees that request immediately -
#8: picks up the only request from the DRAM read queue (
readQueue
) and services that.If there were multiple requests, priority arbitration under
DRAMCtrl::chooseNext
could chose a different one than the first based on packet prioritiesThis puts the request on the response queue
respQueue
and schedules anotherDRAMCtrl::processNextReqEvent
but the request queue is empty, and that does nos schedule further events -
#17: picks up the only request from the DRAM response queue and services that by placing it in yet another queue, and scheduling the
PacketQueue::processSendEvent
which will later pick up that packet -
#19: picks up the request from the previous queue, and forwards it to another queue, and schedules yet another
PacketQueue::processSendEvent
The current one is the DRAM passing the message to the XBar, and the next
processSendEvent
is the XBar finally sending it back to the CPU -
#23: the XBar port is actually sending the reply back.
If knows to which CPU core to send the request to because ports keep a map of request to source:
const auto route_lookup = routeTo.find(pkt->req);