29.12.3. x86 fused multiply add (FMA)

  • userland/arch/x86_64/vfmadd132pd.S: VFMADD132PD: "Multiply packed double-precision floating-point values from xmm1 and xmm3/mem, add to xmm2 and put result in xmm1." TODO: but I don’t understand the manual, experimentally on 2017 Lenovo ThinkPad P51 Ubuntu 19.04 host the result is stored in XMM2!

These instructions were not part of any SSEn set: they actually have a dedicated CPUID flag for it! It appears under /proc/cpuinfo as fma. They were introduced into AVX512F however.

They are also unusual for x86 instructions in that they take 3 operands, as you would intuitively expect from the definition of FMA.