Linux Kernel Module Cheat

30.1.3. ARM instruction encodings

Understanding the basics of instruction encodings is fundamental to help you to remember what instructions do and why some things are possible or not, notably the ARM LDR pseudo-instruction and the ADRP instruction.

aarch32 has two "instruction sets", which to look just like encodings.

The encodings are:

A32: every instruction is 4 bytes long. Can encode every instruction.
T32: most common instructions are 2 bytes long. Many others less common ones are 4 bytes long.

T stands for "Thumb", which is the original name for the technology, ARMv8 architecture reference manual A1.3.2 "The ARM instruction sets" says:

In previous documentation, these instruction sets were called the ARM and Thumb instruction sets

See also: ARMv8 architecture reference manual F2.1.3 "Instruction encodings".

Within each instruction set, there can be multiple encodings for a given function, and they are noted simply as:

A1, A2, …: A32 encodings
T1, T2, ..m: T32 encodings

The state bit PSTATE.T determines if the processor is in thumb mode or not. ARMv8 architecture reference manual says that this bit it can only be read from ARM BX instruction

https://stackoverflow.com/questions/22660025/how-can-i-tell-if-i-am-in-arm-mode-or-thumb-mode-in-gdb

TODO: details: https://stackoverflow.com/questions/22660025/how-can-i-tell-if-i-am-in-arm-mode-or-thumb-mode-in-gdb says it is 0x20 & CPSR.

This RISC-y mostly fixed instruction length design likely makes processor design easier and allows for certain optimizations, at the cost of slightly more complex assembly, as you can’t encode 4 / 8 byte addresses in a single instruction. Totally worth it IMHO.

This design can be contrasted with x86, which has widely variable instruction length.

We can swap between A32 and T32 with the BX and BLX instructions: http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.kui0100a/armasm_cihfddaf.htm puts it really nicely:

The BL and BLX instructions copy the address of the next instruction into lr (r14, the link register).

The BX and BLX instructions can change the processor state from ARM to Thumb, or from Thumb to ARM.

BLX label always changes the state.

BX Rm and BLX Rm derive the target state from bit[0] of Rm:

if bit[0] of Rm is 0, the processor changes to, or remains in, ARM state

if bit[0] of Rm is 1, the processor changes to, or remains in, Thumb state.

The BXJ instruction changes the processor state to Jazelle.

Bibliography:

https://stackoverflow.com/questions/28669905/what-is-the-difference-between-the-arm-thumb-and-thumb-2-instruction-encodings