37.2. Superscalar processor
http://www.lighterra.com/papers/modernmicroprocessors/ explains it well.
You basically decode multiple instructions in one go, and run them at the same time if they can go in separate functional units and have no conflicts. Genius!
And so the concept of branch predictor must come in here: when a conditional branch is reached, you have to decide which side to execute before knowing for sure.
This is why it is called a type of Instruction level parallelism.
Although this is a microarchitectural feature, it is so important that it is publicly documented. For example:
-
https://en.wikipedia.org/wiki/ARM_Cortex-A77: ARM Cortex A77 (2019) has a 4-wide superscalar decode (and is out-of-order)