24.22.7.1. gem5 MinorCPU default functional units
Which units are available is visible for example on the gem5 config.ini of a gem5 MinorCPU run. Functional units are not present in simple CPUs like gem5 TimingSimpleCPU.
For example, on gem5 872cb227fdc0b4d60acc7840889d567a6936b6e1, the config.ini of a minor run:
./run \ --arch aarch64 \ --emulator gem5 \ --userland userland/arch/aarch64/freestanding/linux/hello.S \ --trace-insts-stdout \ -- \ --cpu-type MinorCPU \ --caches
contains:
[system.cpu] type=MinorCPU children=branchPred dcache dtb executeFuncUnits icache interrupts isa itb power_state tracer workload executeInputWidth=2 executeIssueLimit=2
Here also note the executeInputWidth=2 and executeIssueLimit=2 suggesting that this is a dual issue superscalar processor.
The system.cpu points to:
[system.cpu.executeFuncUnits] type=MinorFUPool children=funcUnits0 funcUnits1 funcUnits2 funcUnits3 funcUnits4 funcUnits5 funcUnits6 funcUnits7
and the two first units are in full:
[system.cpu.executeFuncUnits.funcUnits0] type=MinorFU children=opClasses timings opClasses=system.cpu.executeFuncUnits.funcUnits0.opClasses opLat=3 [system.cpu.executeFuncUnits.funcUnits0.opClasses] type=MinorOpClassSet children=opClasses [system.cpu.executeFuncUnits.funcUnits0.opClasses.opClasses] type=MinorOpClass opClass=IntAlu
and:
[system.cpu.executeFuncUnits.funcUnits1] type=MinorFU children=opClasses timings opLat=3 [system.cpu.executeFuncUnits.funcUnits1.opClasses] type=MinorOpClassSet children=opClasses opClasses=system.cpu.executeFuncUnits.funcUnits1.opClasses.opClasses [system.cpu.executeFuncUnits.funcUnits1.opClasses.opClasses] type=MinorOpClass opClass=IntAlu
So we understand that both:
-
the first and second functional units are
IntAlu, so doing integer arithmetic operations -
both have a latency of 3
-
each functional unit can have a set of
opClasswith more than one type. Those first two units just happen to have a single type.
The full list is:
-
0, 1:
IntAlu,opLat=3 -
2:
IntMult,opLat=3 -
3:
IntDiv,opLat=9. So we see that a more complex operation such as division has higher latency. -
4:
FloatAdd,FloatCmp, and a gazillion other floating point related things.opLat=6. -
5:
SimdPredAlu: TODO SVE-related?opLat=3 -
6:
MemRead,MemWrite,FloatMemRead,FloatMemWrite.opLat=1 -
7:
IprAccess(TODO),InstPrefetch
These are of course all specified in from the Python at src/cpu/minor/MinorCPU.py:
class MinorDefaultFUPool(MinorFUPool):
funcUnits = [MinorDefaultIntFU(), MinorDefaultIntFU(),
MinorDefaultIntMulFU(), MinorDefaultIntDivFU(),
MinorDefaultFloatSimdFU(), MinorDefaultPredFU(),
MinorDefaultMemFU(), MinorDefaultMiscFU()]
We then expect that each instruction has a certain opClass that determines on which unit it can run.
For example: class AddImm, which is what we get on a simple add x1, x2, 0, sets itself as an IntAluOp on the constructor as expected:
AddImm::AddImm(ExtMachInst machInst,
IntRegIndex _dest,
IntRegIndex _op1,
uint32_t _imm,
bool _rotC)
: DataImmOp("add", machInst, IntAluOp,
_dest, _op1, _imm, _rotC)