Hacking on LLVM

Some useful tips for debugging LLVM source code with VSCode on Linux (all commands below assume a bash shell). It is not a substitute for the excellent LLVM documentation

Compiling from source

Some packages you will need:
cmake
libedit-dev
python-dev
swig

# optional
clang
Compilation commands:
git clone https://github.com/llvm/llvm-project.git
mkdir build

# make sure you have cmake and build dependencies (go to llvm.org)
# LLVM will NOT build with pre-c++14 compiler
# Optionally, you can build with RelWithDebInfo for a slightly smaller build (still with debug symbols)
cmake -DLLVM_ENABLE_PROJECTS="clang;lldb;clang-tools-extra" -DCMAKE_BUILD_TYPE=Debug ../llvm-project/llvm

# build may take a couple of minutes to 1-2 hours depending on your hardware

Note if you want to also build LLDB, you should instead use -DLLVM_ENABLE_PROJECTS="clang;lldb". Note, this will default to the default CC and CXX compilers on your system. I highly advise using clang/clang++ so you can debug with LLDB

export CC=clang
export CXX=clang++

Debugging clang with VSCode

Required extensions

I highly suggest you use LLDB instead of GDB, and clangd for code navigation. You can get the LLDB/clangd extensions for free in VSCode, and configure clangd to use the executable you just built from source.

Debugging

clang itself is composed of 2 parts, the driver and the actual invocation into the FE. Thus, to get the actual call into the compiler for a simple program, you need to issue the -v flag:

clang++ -v foo.cpp

This will run the driver in verbose mode and will output a call to clang with the cc1 flag and a bunch of other flags. This is the actual command you want to use as a debug target within VSCode. To quickly convert the space separated string into a JSON friendly encoding for VSCode, you can run the following:

awk -v RS='' -v OFS='","' 'NF { $1 = $1; print "\"" $0 "\"" }' command.txt
Then create a simple VSCode debug target with contents similar to:
{
    "version": "0.2.0",
    "configurations": [
        {
            "type": "lldb",
            "request": "launch",
            "name": "Debug",
            "program": "/media/luis/TI10657400D/llvm/build/bin/clang++",
            "args": [
                "-cc1","-triple","x86_64-unknown-linux-gnu","-emit-obj","-mrelax-all","-disable-free","-main-file-name","test.cpp","-mrelocation-model","static","-mthread-model","posix","-mframe-pointer=all","-fmath-errno","-fno-rounding-math","-masm-verbose","-mconstructor-aliases","-munwind-tables","-fuse-init-array","-target-cpu","x86-64","-dwarf-column-info","-debugger-tuning=gdb","-v","-resource-dir","/media/luis/TI10657400D/llvm/build/lib/clang/10.0.0","-I","/usr/lib/gcc/x86_64-linux-gnu/7/include/","-internal-isystem","/usr/lib/gcc/x86_64-linux-gnu/7.4.0/../../../../include/c++/7.4.0","-internal-isystem","/usr/lib/gcc/x86_64-linux-gnu/7.4.0/../../../../include/x86_64-linux-gnu/c++/7.4.0","-internal-isystem","/usr/lib/gcc/x86_64-linux-gnu/7.4.0/../../../../include/x86_64-linux-gnu/c++/7.4.0","-internal-isystem","/usr/lib/gcc/x86_64-linux-gnu/7.4.0/../../../../include/c++/7.4.0/backward","-internal-isystem","/usr/local/include","-internal-isystem","/media/luis/TI10657400D/llvm/build/lib/clang/10.0.0/include","-internal-externc-isystem","/usr/include/x86_64-linux-gnu","-internal-externc-isystem","/include","-internal-externc-isystem","/usr/include","-O0","-fdeprecated-macro","-fdebug-compilation-dir","/media/luis/TI10657400D/llvm/build/bin","-ferror-limit","19","-fmessage-length","0","-fopenmp","-fgnuc-version=4.2.1","-fobjc-runtime=gcc","-fcxx-exceptions","-fexceptions","-fdiagnostics-show-option","-fcolor-diagnostics","-faddrsig","-o","/tmp/test-c457c8.o","-x","c++","/home/luis/test.cpp","clang","-cc1","version","10.0.0","based","upon","LLVM","10.0.0git","default","target","x86_64-unknown-linux-gnu",
                "/home/luis/test.cpp"
            ],
            "cwd": "${workspaceFolder}"
        }
    ]
}

You should now be able to set breakpoints anywhere in the compiler. A good starting point to set a breakpoint is in CodeGenModule::CodeGenModule.

Debugging Notes

Clang notes

LLVM IR Notes

Backend/Code Generators

See https://releases.llvm.org/6.0.1/docs/CodeGenerator.html

Useful commands

If you have no idea where to break in source, you can get an idea of which transforms are run as part of a clang invocation with the -Rpass and -Rpass-analysis flags. These are opt-in flags in which not every transform participates, but give you a good idea of what's happening "under the hood". To view all participating transforms, issue: -Rpass=.* -Rpass-analysis=.*

Other good places to break on are in PassManager::run and LegacyPassManager::run

Once you're stopped at a breakpoint in VSCode, there are a couple of useful commands you can call from LLDB to inspect the state of the IR. Some of which include:

Additional Resources

The list below is a non exhaustive list of all resources I found helpful while hacking on LLVM. I take no credit for them. They are listed in no particular order.

gem5

gem5.opt --debug-help | less gem5.opt --debug-flags=XXX sudo apt-get install libc6-armel-cross libc6-dev-armel-cross binutils-arm-linux-gnueabi Building ARM on x86 sudo apt-get install libstdc++-10- MIPS:
sudo apt-get install libstdc++-10-dev-mipsel-cross
apt-get install binutils-mipsel-linux-gnu
sudo apt-get install gcc-mips-linux-gnu
Generate code (GCC):
mipsel-linux-gnu-gcc -O0 -g test.c
Note you need to use the mipsel toolchain as GEM5 only supports little endian programs. Generate code (MIPS):
clang -static --target=mipsel-linux-gnu test.c
Run code on GEM5:
./gem5.opt ../../configs/example/se.py  -c ~/sandbox/a.out
Run dynamically linked code on GEM5:
./gem5.opt configs/example/se.py --cmd=/home/luis/research/gem5/a.out --redirects /lib=/usr/mipsel-linux-gnu/lib --interp-dir /usr/mipsel-linux-gnu
The additional flags are needed to setup the right library search paths for resolving dynamically linked symbols




Last modified: February 12 2022