24.22.4.2.30. TimingSimpleCPU analysis: LDR stall
One important thing we want to check now, is how the memory reads are going to make the processor stall in the middle of an instruction.
This is also discussed at: gem5 execute
vs initiateAcc
vs completeAcc
.
Since we were using a simple CPU without a pipeline, the data memory access stall everything: there is no further progress until memory comes back.
For that, we can GDB to the TimingSimpleCPU::completeIfetch
of the first LDR done in our test program.
By doing that, we see that this time at:
if (curStaticInst && curStaticInst->isMemRef()) { // load or store: just send to dcache Fault fault = curStaticInst->initiateAcc(&t_info, traceData); if (_status == BaseSimpleCPU::Running) { } } else if (curStaticInst) { // non-memory instruction: execute completely now Fault fault = curStaticInst->execute(&t_info, traceData);
-
curStaticInst->isMemRef()
is true, and there is no instructionexecute
call in that part of the branch, only for instructions that don’t touch memory -
_status
isBaseSimpleCPU::Status::DcacheWaitResponse
andadvanceInst
is not yet called
We can verify that execute
never happens by putting a breakpoint on ArmISAInst::LDRXL64_LIT::execute
which never gets called.
Therefore, we conclude that initiateAcc
is what actually starts the memory request.
Later on, when the memory access completes the event calls TimingSimpleCPU::completeDataAccess
which calls ArmISAInst::LDRXL64_LIT::completeAcc
, which sets the register value to what was read from memory.
More memory event details can be seen at: gem5 functional vs atomic vs timing memory requests.
The following is the region of interest of the event log:
175000: Event: Event_40: Timing CPU icache tick 40 executed @ 175000 175000: Event: system.mem_ctrls.wrapped_function_event: EventFunctionWrapped 9 scheduled @ 175000 175000: Event: system.membus.reqLayer0.wrapped_function_event: EventFunctionWrapped 60 scheduled @ 176000 175000: Event: system.mem_ctrls.wrapped_function_event: EventFunctionWrapped 9 executed @ 175000 175000: Event: system.mem_ctrls.wrapped_function_event: EventFunctionWrapped 10 scheduled @ 193750 175000: Event: system.mem_ctrls.wrapped_function_event: EventFunctionWrapped 9 scheduled @ 175000 175000: Event: system.mem_ctrls.wrapped_function_event: EventFunctionWrapped 9 executed @ 175000 176000: Event: system.membus.reqLayer0.wrapped_function_event: EventFunctionWrapped 60 executed @ 176000 193750: Event: system.mem_ctrls.wrapped_function_event: EventFunctionWrapped 10 executed @ 193750 193750: Event: system.mem_ctrls.port-RespPacketQueue.wrapped_function_event: EventFunctionWrapped 8 scheduled @ 221750 221750: Event: system.mem_ctrls.port-RespPacketQueue.wrapped_function_event: EventFunctionWrapped 8 executed @ 221750 221750: Event: system.membus.slave[2]-RespPacketQueue.wrapped_function_event: EventFunctionWrapped 66 scheduled @ 224000 221750: Event: system.membus.respLayer2.wrapped_function_event: EventFunctionWrapped 67 scheduled @ 224000 224000: Event: system.membus.respLayer2.wrapped_function_event: EventFunctionWrapped 67 executed @ 224000 224000: Event: system.membus.slave[2]-RespPacketQueue.wrapped_function_event: EventFunctionWrapped 66 executed @ 224000 224000: Event: Event_42: Timing CPU dcache tick 42 scheduled @ 224000 224000: Event: Event_42: Timing CPU dcache tick 42 executed @ 224000 175000: ExecEnable: system.cpu: A0 T0 : @asm_main_after_prologue+8 : ldr w2, #4194464 : MemRead : D=0x0000000000000006 A=0x4000a0 flags=(IsInteger|IsMemRef|IsLoad)
We first find it by looking for the ExecEnable
of LDR.
Then, we go up to the previous Timing CPU icache tick
event, which from the analysis of previous instruction traces, we know is where the instruction execution starts, the LDR instruction fetch is done by then!
Next, several events happen as the data request must be percolating through the memory system, it must be very similar to the instruction fetches. TODO analyze event function names.
Finally, at last we reach
224000: Event: Event_42: Timing CPU dcache tick 42 executed @ 224000 175000: ExecEnable: system.cpu: A0 T0 : @asm_main_after_prologue+8 : ldr w2, #4194464 : MemRead : D=0x0000000000000006 A=0x4000a0 flags=(IsInteger|IsMemRef|IsLoad)
from which we guess:
-
224000
: this is the time that the data request finally returned, and at which execute gets called -
175000
: the log finally prints at the end of execution, but it does not show the actual time that things finished, but rather the time that the ifetch finished, which happened in the past