24.22.7.1. gem5 MinorCPU default functional units

Which units are available is visible for example on the gem5 config.ini of a gem5 MinorCPU run. Functional units are not present in simple CPUs like gem5 TimingSimpleCPU.

For example, on gem5 872cb227fdc0b4d60acc7840889d567a6936b6e1, the config.ini of a minor run:

./run   \
  --arch aarch64 \
  --emulator gem5 \
  --userland userland/arch/aarch64/freestanding/linux/hello.S \
  --trace-insts-stdout \
  -- \
  --cpu-type MinorCPU \
  --caches

contains:

[system.cpu]
type=MinorCPU
children=branchPred dcache dtb executeFuncUnits icache interrupts isa itb power_state tracer workload
executeInputWidth=2
executeIssueLimit=2

Here also note the executeInputWidth=2 and executeIssueLimit=2 suggesting that this is a dual issue superscalar processor.

The system.cpu points to:

[system.cpu.executeFuncUnits]
type=MinorFUPool
children=funcUnits0 funcUnits1 funcUnits2 funcUnits3 funcUnits4 funcUnits5 funcUnits6 funcUnits7

and the two first units are in full:

[system.cpu.executeFuncUnits.funcUnits0]
type=MinorFU
children=opClasses timings
opClasses=system.cpu.executeFuncUnits.funcUnits0.opClasses
opLat=3

[system.cpu.executeFuncUnits.funcUnits0.opClasses]
type=MinorOpClassSet
children=opClasses

[system.cpu.executeFuncUnits.funcUnits0.opClasses.opClasses]
type=MinorOpClass
opClass=IntAlu

and:

[system.cpu.executeFuncUnits.funcUnits1]
type=MinorFU
children=opClasses timings
opLat=3

[system.cpu.executeFuncUnits.funcUnits1.opClasses]
type=MinorOpClassSet
children=opClasses
opClasses=system.cpu.executeFuncUnits.funcUnits1.opClasses.opClasses

[system.cpu.executeFuncUnits.funcUnits1.opClasses.opClasses]
type=MinorOpClass
opClass=IntAlu

So we understand that both:

  • the first and second functional units are IntAlu, so doing integer arithmetic operations

  • both have a latency of 3

  • each functional unit can have a set of opClass with more than one type. Those first two units just happen to have a single type.

The full list is:

  • 0, 1: IntAlu, opLat=3

  • 2: IntMult, opLat=3

  • 3: IntDiv, opLat=9. So we see that a more complex operation such as division has higher latency.

  • 4: FloatAdd, FloatCmp, and a gazillion other floating point related things. opLat=6.

  • 5: SimdPredAlu: TODO SVE-related? opLat=3

  • 6: MemRead, MemWrite, FloatMemRead, FloatMemWrite. opLat=1

  • 7: IprAccess (TODO), InstPrefetch

These are of course all specified in from the Python at src/cpu/minor/MinorCPU.py:

class MinorDefaultFUPool(MinorFUPool):
    funcUnits = [MinorDefaultIntFU(), MinorDefaultIntFU(),
        MinorDefaultIntMulFU(), MinorDefaultIntDivFU(),
        MinorDefaultFloatSimdFU(), MinorDefaultPredFU(),
        MinorDefaultMemFU(), MinorDefaultMiscFU()]

We then expect that each instruction has a certain opClass that determines on which unit it can run.

For example: class AddImm, which is what we get on a simple add x1, x2, 0, sets itself as an IntAluOp on the constructor as expected:

    AddImm::AddImm(ExtMachInst machInst,
                                          IntRegIndex _dest,
                                          IntRegIndex _op1,
                                          uint32_t _imm,
                                          bool _rotC)
        : DataImmOp("add", machInst, IntAluOp,
                         _dest, _op1, _imm, _rotC)