24.22.4.2.30. TimingSimpleCPU analysis: LDR stall

One important thing we want to check now, is how the memory reads are going to make the processor stall in the middle of an instruction.

This is also discussed at: gem5 execute vs initiateAcc vs completeAcc.

Since we were using a simple CPU without a pipeline, the data memory access stall everything: there is no further progress until memory comes back.

For that, we can GDB to the TimingSimpleCPU::completeIfetch of the first LDR done in our test program.

By doing that, we see that this time at:

if (curStaticInst && curStaticInst->isMemRef()) {
    // load or store: just send to dcache
    Fault fault = curStaticInst->initiateAcc(&t_info, traceData);

    if (_status == BaseSimpleCPU::Running) {
    }
} else if (curStaticInst) {
    // non-memory instruction: execute completely now
    Fault fault = curStaticInst->execute(&t_info, traceData);
  • curStaticInst->isMemRef() is true, and there is no instruction execute call in that part of the branch, only for instructions that don’t touch memory

  • _status is BaseSimpleCPU::Status::DcacheWaitResponse and advanceInst is not yet called

We can verify that execute never happens by putting a breakpoint on ArmISAInst::LDRXL64_LIT::execute which never gets called.

Therefore, we conclude that initiateAcc is what actually starts the memory request.

Later on, when the memory access completes the event calls TimingSimpleCPU::completeDataAccess which calls ArmISAInst::LDRXL64_LIT::completeAcc, which sets the register value to what was read from memory.

More memory event details can be seen at: gem5 functional vs atomic vs timing memory requests.

The following is the region of interest of the event log:

 175000: Event: Event_40: Timing CPU icache tick 40 executed @ 175000
 175000: Event: system.mem_ctrls.wrapped_function_event: EventFunctionWrapped 9 scheduled @ 175000
 175000: Event: system.membus.reqLayer0.wrapped_function_event: EventFunctionWrapped 60 scheduled @ 176000

 175000: Event: system.mem_ctrls.wrapped_function_event: EventFunctionWrapped 9 executed @ 175000
 175000: Event: system.mem_ctrls.wrapped_function_event: EventFunctionWrapped 10 scheduled @ 193750
 175000: Event: system.mem_ctrls.wrapped_function_event: EventFunctionWrapped 9 scheduled @ 175000

 175000: Event: system.mem_ctrls.wrapped_function_event: EventFunctionWrapped 9 executed @ 175000

 176000: Event: system.membus.reqLayer0.wrapped_function_event: EventFunctionWrapped 60 executed @ 176000

 193750: Event: system.mem_ctrls.wrapped_function_event: EventFunctionWrapped 10 executed @ 193750
 193750: Event: system.mem_ctrls.port-RespPacketQueue.wrapped_function_event: EventFunctionWrapped 8 scheduled @ 221750

 221750: Event: system.mem_ctrls.port-RespPacketQueue.wrapped_function_event: EventFunctionWrapped 8 executed @ 221750
 221750: Event: system.membus.slave[2]-RespPacketQueue.wrapped_function_event: EventFunctionWrapped 66 scheduled @ 224000
 221750: Event: system.membus.respLayer2.wrapped_function_event: EventFunctionWrapped 67 scheduled @ 224000

 224000: Event: system.membus.respLayer2.wrapped_function_event: EventFunctionWrapped 67 executed @ 224000

 224000: Event: system.membus.slave[2]-RespPacketQueue.wrapped_function_event: EventFunctionWrapped 66 executed @ 224000
 224000: Event: Event_42: Timing CPU dcache tick 42 scheduled @ 224000

 224000: Event: Event_42: Timing CPU dcache tick 42 executed @ 224000
 175000: ExecEnable: system.cpu: A0 T0 : @asm_main_after_prologue+8    :   ldr   w2, #4194464       : MemRead :  D=0x0000000000000006 A=0x4000a0  flags=(IsInteger|IsMemRef|IsLoad)

We first find it by looking for the ExecEnable of LDR.

Then, we go up to the previous Timing CPU icache tick event, which from the analysis of previous instruction traces, we know is where the instruction execution starts, the LDR instruction fetch is done by then!

Next, several events happen as the data request must be percolating through the memory system, it must be very similar to the instruction fetches. TODO analyze event function names.

Finally, at last we reach

 224000: Event: Event_42: Timing CPU dcache tick 42 executed @ 224000
 175000: ExecEnable: system.cpu: A0 T0 : @asm_main_after_prologue+8    :   ldr   w2, #4194464       : MemRead :  D=0x0000000000000006 A=0x4000a0  flags=(IsInteger|IsMemRef|IsLoad)

from which we guess:

  • 224000: this is the time that the data request finally returned, and at which execute gets called

  • 175000: the log finally prints at the end of execution, but it does not show the actual time that things finished, but rather the time that the ifetch finished, which happened in the past