24.22.7.1. gem5 MinorCPU
default functional units
Which units are available is visible for example on the gem5 config.ini of a gem5 MinorCPU run. Functional units are not present in simple CPUs like gem5 TimingSimpleCPU
.
For example, on gem5 872cb227fdc0b4d60acc7840889d567a6936b6e1, the config.ini
of a minor run:
./run \ --arch aarch64 \ --emulator gem5 \ --userland userland/arch/aarch64/freestanding/linux/hello.S \ --trace-insts-stdout \ -- \ --cpu-type MinorCPU \ --caches
contains:
[system.cpu] type=MinorCPU children=branchPred dcache dtb executeFuncUnits icache interrupts isa itb power_state tracer workload executeInputWidth=2 executeIssueLimit=2
Here also note the executeInputWidth=2
and executeIssueLimit=2
suggesting that this is a dual issue superscalar processor.
The system.cpu
points to:
[system.cpu.executeFuncUnits] type=MinorFUPool children=funcUnits0 funcUnits1 funcUnits2 funcUnits3 funcUnits4 funcUnits5 funcUnits6 funcUnits7
and the two first units are in full:
[system.cpu.executeFuncUnits.funcUnits0] type=MinorFU children=opClasses timings opClasses=system.cpu.executeFuncUnits.funcUnits0.opClasses opLat=3 [system.cpu.executeFuncUnits.funcUnits0.opClasses] type=MinorOpClassSet children=opClasses [system.cpu.executeFuncUnits.funcUnits0.opClasses.opClasses] type=MinorOpClass opClass=IntAlu
and:
[system.cpu.executeFuncUnits.funcUnits1] type=MinorFU children=opClasses timings opLat=3 [system.cpu.executeFuncUnits.funcUnits1.opClasses] type=MinorOpClassSet children=opClasses opClasses=system.cpu.executeFuncUnits.funcUnits1.opClasses.opClasses [system.cpu.executeFuncUnits.funcUnits1.opClasses.opClasses] type=MinorOpClass opClass=IntAlu
So we understand that both:
-
the first and second functional units are
IntAlu
, so doing integer arithmetic operations -
both have a latency of 3
-
each functional unit can have a set of
opClass
with more than one type. Those first two units just happen to have a single type.
The full list is:
-
0, 1:
IntAlu
,opLat=3
-
2:
IntMult
,opLat=3
-
3:
IntDiv
,opLat=9
. So we see that a more complex operation such as division has higher latency. -
4:
FloatAdd
,FloatCmp
, and a gazillion other floating point related things.opLat=6
. -
5:
SimdPredAlu
: TODO SVE-related?opLat=3
-
6:
MemRead
,MemWrite
,FloatMemRead
,FloatMemWrite
.opLat=1
-
7:
IprAccess
(TODO),InstPrefetch
These are of course all specified in from the Python at src/cpu/minor/MinorCPU.py
:
class MinorDefaultFUPool(MinorFUPool): funcUnits = [MinorDefaultIntFU(), MinorDefaultIntFU(), MinorDefaultIntMulFU(), MinorDefaultIntDivFU(), MinorDefaultFloatSimdFU(), MinorDefaultPredFU(), MinorDefaultMemFU(), MinorDefaultMiscFU()]
We then expect that each instruction has a certain opClass
that determines on which unit it can run.
For example: class AddImm
, which is what we get on a simple add x1, x2, 0
, sets itself as an IntAluOp
on the constructor as expected:
AddImm::AddImm(ExtMachInst machInst, IntRegIndex _dest, IntRegIndex _op1, uint32_t _imm, bool _rotC) : DataImmOp("add", machInst, IntAluOp, _dest, _op1, _imm, _rotC)