25. Gensim

MIT licensed Binary translation simulator, so a bit like an MIT QEMU.

Video showing it boot Linux fast: https://www.youtube.com/watch?v=aZXx17oYumc

Its name is unfortunately completely and totally overshadowed by an unrelated software with the sane name: https://radimrehurek.com/gensim/

TODO: advantages over QEMU. Like the name implies, they seem to have a nice ISA description language. From quick internals look, seems to generate LLVM intermediate language, which sound good.

Build on Ubuntu 20.04:

git submodule update --init submodules/gensim
sudo apt install libantlr3c-dev
cd submodule/gensim
make

First fails with:

arm-none-eabi-gcc: error: unrecognized -march target: armv5

Let’s try just armv8, who cares about arvm5!!!

mkdir build
cd build
cmake -DTESTING_ENABLED=FALSE -DCMAKE_BUILD_TYPE=DEBUGOPT ..
make -j`nproc` model-armv8
terminate called after throwing an instance of 'std::logic_error'
  what():  Unrecognised intrinsic: __builtin_abs64
Aborted (core dumped)

Get the failing command with:

make VERBOSE=1 model-armv8

and we see some code generation step:

cd /home/ciro/bak/git/linux-kernel-module-cheat/submodules/gensim/models/armv8 && \
  /home/ciro/bak/git/linux-kernel-module-cheat/submodules/gensim/build/dist/bin/gensim \
  -a /home/ciro/bak/git/linux-kernel-module-cheat/submodules/gensim/models/armv8/aarch64.ac \
  -s module,arch,decode,disasm,ee_interp,ee_blockjit,jumpinfo,function,makefile \
  -o decode.GenerateDotGraph=1,makefile.libtrace_path=/home/ciro/bak/git/linux-kernel-module-cheat/submodules/gensim/support/libtrace/inc,makefile.archsim_path=/home/ciro/bak/git/linux-kernel-module-cheat/submodules/gensim/archsim/inc,makefile.llvm_path=,makefile.Optimise=2,makefile.Debug=1 \
  -t /home/ciro/bak/git/linux-kernel-module-cheat/submodules/gensim/build/models/armv8/output-aarch64/

We can see an inclusion path:

gensim/models/armv8/aarch64.ac
		ac_isa("isa.ac");
gensim/models/armv8/isa.ac
		ac_execute("execute.simd");

and where gensim/models/armv8/isa.ac contains __builtin_abs64 usages.

Rebuilding with -DCMAKE_BUILD_TYPE=DEBUG + GDB on gensim shows that the error comes from a call to gci.GenerateExecuteBodyFor(body_str, *action);, so it looks like there are some missing cases in gensim/src/generators/GenCInterpreter/InterpreterNodeWalker.cpp function SSAIntrinsicStatementWalker::EmitFixedCode, e.g. there should be one for __builtin_abs64.

This is completely broken academic code! They must be using an off-tree of part of the tool and forgot to commit.