diff options
Diffstat (limited to 'docs/tutorial/LangImpl09.rst')
-rw-r--r-- | docs/tutorial/LangImpl09.rst | 462 |
1 files changed, 462 insertions, 0 deletions
diff --git a/docs/tutorial/LangImpl09.rst b/docs/tutorial/LangImpl09.rst new file mode 100644 index 00000000000..0053960756d --- /dev/null +++ b/docs/tutorial/LangImpl09.rst @@ -0,0 +1,462 @@ +====================================== +Kaleidoscope: Adding Debug Information +====================================== + +.. contents:: + :local: + +Chapter 9 Introduction +====================== + +Welcome to Chapter 9 of the "`Implementing a language with +LLVM <index.html>`_" tutorial. In chapters 1 through 8, we've built a +decent little programming language with functions and variables. +What happens if something goes wrong though, how do you debug your +program? + +Source level debugging uses formatted data that helps a debugger +translate from binary and the state of the machine back to the +source that the programmer wrote. In LLVM we generally use a format +called `DWARF <http://dwarfstd.org>`_. DWARF is a compact encoding +that represents types, source locations, and variable locations. + +The short summary of this chapter is that we'll go through the +various things you have to add to a programming language to +support debug info, and how you translate that into DWARF. + +Caveat: For now we can't debug via the JIT, so we'll need to compile +our program down to something small and standalone. As part of this +we'll make a few modifications to the running of the language and +how programs are compiled. This means that we'll have a source file +with a simple program written in Kaleidoscope rather than the +interactive JIT. It does involve a limitation that we can only +have one "top level" command at a time to reduce the number of +changes necessary. + +Here's the sample program we'll be compiling: + +.. code-block:: python + + def fib(x) + if x < 3 then + 1 + else + fib(x-1)+fib(x-2); + + fib(10) + + +Why is this a hard problem? +=========================== + +Debug information is a hard problem for a few different reasons - mostly +centered around optimized code. First, optimization makes keeping source +locations more difficult. In LLVM IR we keep the original source location +for each IR level instruction on the instruction. Optimization passes +should keep the source locations for newly created instructions, but merged +instructions only get to keep a single location - this can cause jumping +around when stepping through optimized programs. Secondly, optimization +can move variables in ways that are either optimized out, shared in memory +with other variables, or difficult to track. For the purposes of this +tutorial we're going to avoid optimization (as you'll see with one of the +next sets of patches). + +Ahead-of-Time Compilation Mode +============================== + +To highlight only the aspects of adding debug information to a source +language without needing to worry about the complexities of JIT debugging +we're going to make a few changes to Kaleidoscope to support compiling +the IR emitted by the front end into a simple standalone program that +you can execute, debug, and see results. + +First we make our anonymous function that contains our top level +statement be our "main": + +.. code-block:: udiff + + - auto Proto = llvm::make_unique<PrototypeAST>("", std::vector<std::string>()); + + auto Proto = llvm::make_unique<PrototypeAST>("main", std::vector<std::string>()); + +just with the simple change of giving it a name. + +Then we're going to remove the command line code wherever it exists: + +.. code-block:: udiff + + @@ -1129,7 +1129,6 @@ static void HandleTopLevelExpression() { + /// top ::= definition | external | expression | ';' + static void MainLoop() { + while (1) { + - fprintf(stderr, "ready> "); + switch (CurTok) { + case tok_eof: + return; + @@ -1184,7 +1183,6 @@ int main() { + BinopPrecedence['*'] = 40; // highest. + + // Prime the first token. + - fprintf(stderr, "ready> "); + getNextToken(); + +Lastly we're going to disable all of the optimization passes and the JIT so +that the only thing that happens after we're done parsing and generating +code is that the llvm IR goes to standard error: + +.. code-block:: udiff + + @@ -1108,17 +1108,8 @@ static void HandleExtern() { + static void HandleTopLevelExpression() { + // Evaluate a top-level expression into an anonymous function. + if (auto FnAST = ParseTopLevelExpr()) { + - if (auto *FnIR = FnAST->codegen()) { + - // We're just doing this to make sure it executes. + - TheExecutionEngine->finalizeObject(); + - // JIT the function, returning a function pointer. + - void *FPtr = TheExecutionEngine->getPointerToFunction(FnIR); + - + - // Cast it to the right type (takes no arguments, returns a double) so we + - // can call it as a native function. + - double (*FP)() = (double (*)())(intptr_t)FPtr; + - // Ignore the return value for this. + - (void)FP; + + if (!F->codegen()) { + + fprintf(stderr, "Error generating code for top level expr"); + } + } else { + // Skip token for error recovery. + @@ -1439,11 +1459,11 @@ int main() { + // target lays out data structures. + TheModule->setDataLayout(TheExecutionEngine->getDataLayout()); + OurFPM.add(new DataLayoutPass()); + +#if 0 + OurFPM.add(createBasicAliasAnalysisPass()); + // Promote allocas to registers. + OurFPM.add(createPromoteMemoryToRegisterPass()); + @@ -1218,7 +1210,7 @@ int main() { + OurFPM.add(createGVNPass()); + // Simplify the control flow graph (deleting unreachable blocks, etc). + OurFPM.add(createCFGSimplificationPass()); + - + + #endif + OurFPM.doInitialization(); + + // Set the global so the code gen can use this. + +This relatively small set of changes get us to the point that we can compile +our piece of Kaleidoscope language down to an executable program via this +command line: + +.. code-block:: bash + + Kaleidoscope-Ch9 < fib.ks | & clang -x ir - + +which gives an a.out/a.exe in the current working directory. + +Compile Unit +============ + +The top level container for a section of code in DWARF is a compile unit. +This contains the type and function data for an individual translation unit +(read: one file of source code). So the first thing we need to do is +construct one for our fib.ks file. + +DWARF Emission Setup +==================== + +Similar to the ``IRBuilder`` class we have a +`DIBuilder <http://llvm.org/doxygen/classllvm_1_1DIBuilder.html>`_ class +that helps in constructing debug metadata for an llvm IR file. It +corresponds 1:1 similarly to ``IRBuilder`` and llvm IR, but with nicer names. +Using it does require that you be more familiar with DWARF terminology than +you needed to be with ``IRBuilder`` and ``Instruction`` names, but if you +read through the general documentation on the +`Metadata Format <http://llvm.org/docs/SourceLevelDebugging.html>`_ it +should be a little more clear. We'll be using this class to construct all +of our IR level descriptions. Construction for it takes a module so we +need to construct it shortly after we construct our module. We've left it +as a global static variable to make it a bit easier to use. + +Next we're going to create a small container to cache some of our frequent +data. The first will be our compile unit, but we'll also write a bit of +code for our one type since we won't have to worry about multiple typed +expressions: + +.. code-block:: c++ + + static DIBuilder *DBuilder; + + struct DebugInfo { + DICompileUnit *TheCU; + DIType *DblTy; + + DIType *getDoubleTy(); + } KSDbgInfo; + + DIType *DebugInfo::getDoubleTy() { + if (DblTy.isValid()) + return DblTy; + + DblTy = DBuilder->createBasicType("double", 64, 64, dwarf::DW_ATE_float); + return DblTy; + } + +And then later on in ``main`` when we're constructing our module: + +.. code-block:: c++ + + DBuilder = new DIBuilder(*TheModule); + + KSDbgInfo.TheCU = DBuilder->createCompileUnit( + dwarf::DW_LANG_C, "fib.ks", ".", "Kaleidoscope Compiler", 0, "", 0); + +There are a couple of things to note here. First, while we're producing a +compile unit for a language called Kaleidoscope we used the language +constant for C. This is because a debugger wouldn't necessarily understand +the calling conventions or default ABI for a language it doesn't recognize +and we follow the C ABI in our llvm code generation so it's the closest +thing to accurate. This ensures we can actually call functions from the +debugger and have them execute. Secondly, you'll see the "fib.ks" in the +call to ``createCompileUnit``. This is a default hard coded value since +we're using shell redirection to put our source into the Kaleidoscope +compiler. In a usual front end you'd have an input file name and it would +go there. + +One last thing as part of emitting debug information via DIBuilder is that +we need to "finalize" the debug information. The reasons are part of the +underlying API for DIBuilder, but make sure you do this near the end of +main: + +.. code-block:: c++ + + DBuilder->finalize(); + +before you dump out the module. + +Functions +========= + +Now that we have our ``Compile Unit`` and our source locations, we can add +function definitions to the debug info. So in ``PrototypeAST::codegen()`` we +add a few lines of code to describe a context for our subprogram, in this +case the "File", and the actual definition of the function itself. + +So the context: + +.. code-block:: c++ + + DIFile *Unit = DBuilder->createFile(KSDbgInfo.TheCU.getFilename(), + KSDbgInfo.TheCU.getDirectory()); + +giving us an DIFile and asking the ``Compile Unit`` we created above for the +directory and filename where we are currently. Then, for now, we use some +source locations of 0 (since our AST doesn't currently have source location +information) and construct our function definition: + +.. code-block:: c++ + + DIScope *FContext = Unit; + unsigned LineNo = 0; + unsigned ScopeLine = 0; + DISubprogram *SP = DBuilder->createFunction( + FContext, Name, StringRef(), Unit, LineNo, + CreateFunctionType(Args.size(), Unit), false /* internal linkage */, + true /* definition */, ScopeLine, DINode::FlagPrototyped, false); + F->setSubprogram(SP); + +and we now have an DISubprogram that contains a reference to all of our +metadata for the function. + +Source Locations +================ + +The most important thing for debug information is accurate source location - +this makes it possible to map your source code back. We have a problem though, +Kaleidoscope really doesn't have any source location information in the lexer +or parser so we'll need to add it. + +.. code-block:: c++ + + struct SourceLocation { + int Line; + int Col; + }; + static SourceLocation CurLoc; + static SourceLocation LexLoc = {1, 0}; + + static int advance() { + int LastChar = getchar(); + + if (LastChar == '\n' || LastChar == '\r') { + LexLoc.Line++; + LexLoc.Col = 0; + } else + LexLoc.Col++; + return LastChar; + } + +In this set of code we've added some functionality on how to keep track of the +line and column of the "source file". As we lex every token we set our current +current "lexical location" to the assorted line and column for the beginning +of the token. We do this by overriding all of the previous calls to +``getchar()`` with our new ``advance()`` that keeps track of the information +and then we have added to all of our AST classes a source location: + +.. code-block:: c++ + + class ExprAST { + SourceLocation Loc; + + public: + ExprAST(SourceLocation Loc = CurLoc) : Loc(Loc) {} + virtual ~ExprAST() {} + virtual Value* codegen() = 0; + int getLine() const { return Loc.Line; } + int getCol() const { return Loc.Col; } + virtual raw_ostream &dump(raw_ostream &out, int ind) { + return out << ':' << getLine() << ':' << getCol() << '\n'; + } + +that we pass down through when we create a new expression: + +.. code-block:: c++ + + LHS = llvm::make_unique<BinaryExprAST>(BinLoc, BinOp, std::move(LHS), + std::move(RHS)); + +giving us locations for each of our expressions and variables. + +From this we can make sure to tell ``DIBuilder`` when we're at a new source +location so it can use that when we generate the rest of our code and make +sure that each instruction has source location information. We do this +by constructing another small function: + +.. code-block:: c++ + + void DebugInfo::emitLocation(ExprAST *AST) { + DIScope *Scope; + if (LexicalBlocks.empty()) + Scope = TheCU; + else + Scope = LexicalBlocks.back(); + Builder.SetCurrentDebugLocation( + DebugLoc::get(AST->getLine(), AST->getCol(), Scope)); + } + +that both tells the main ``IRBuilder`` where we are, but also what scope +we're in. Since we've just created a function above we can either be in +the main file scope (like when we created our function), or now we can be +in the function scope we just created. To represent this we create a stack +of scopes: + +.. code-block:: c++ + + std::vector<DIScope *> LexicalBlocks; + std::map<const PrototypeAST *, DIScope *> FnScopeMap; + +and keep a map of each function to the scope that it represents (an +DISubprogram is also an DIScope). + +Then we make sure to: + +.. code-block:: c++ + + KSDbgInfo.emitLocation(this); + +emit the location every time we start to generate code for a new AST, and +also: + +.. code-block:: c++ + + KSDbgInfo.FnScopeMap[this] = SP; + +store the scope (function) when we create it and use it: + + KSDbgInfo.LexicalBlocks.push_back(&KSDbgInfo.FnScopeMap[Proto]); + +when we start generating the code for each function. + +also, don't forget to pop the scope back off of your scope stack at the +end of the code generation for the function: + +.. code-block:: c++ + + // Pop off the lexical block for the function since we added it + // unconditionally. + KSDbgInfo.LexicalBlocks.pop_back(); + +Variables +========= + +Now that we have functions, we need to be able to print out the variables +we have in scope. Let's get our function arguments set up so we can get +decent backtraces and see how our functions are being called. It isn't +a lot of code, and we generally handle it when we're creating the +argument allocas in ``PrototypeAST::CreateArgumentAllocas``. + +.. code-block:: c++ + + DIScope *Scope = KSDbgInfo.LexicalBlocks.back(); + DIFile *Unit = DBuilder->createFile(KSDbgInfo.TheCU.getFilename(), + KSDbgInfo.TheCU.getDirectory()); + DILocalVariable D = DBuilder->createParameterVariable( + Scope, Args[Idx], Idx + 1, Unit, Line, KSDbgInfo.getDoubleTy(), true); + + DBuilder->insertDeclare(Alloca, D, DBuilder->createExpression(), + DebugLoc::get(Line, 0, Scope), + Builder.GetInsertBlock()); + +Here we're doing a few things. First, we're grabbing our current scope +for the variable so we can say what range of code our variable is valid +through. Second, we're creating the variable, giving it the scope, +the name, source location, type, and since it's an argument, the argument +index. Third, we create an ``lvm.dbg.declare`` call to indicate at the IR +level that we've got a variable in an alloca (and it gives a starting +location for the variable), and setting a source location for the +beginning of the scope on the declare. + +One interesting thing to note at this point is that various debuggers have +assumptions based on how code and debug information was generated for them +in the past. In this case we need to do a little bit of a hack to avoid +generating line information for the function prologue so that the debugger +knows to skip over those instructions when setting a breakpoint. So in +``FunctionAST::CodeGen`` we add a couple of lines: + +.. code-block:: c++ + + // Unset the location for the prologue emission (leading instructions with no + // location in a function are considered part of the prologue and the debugger + // will run past them when breaking on a function) + KSDbgInfo.emitLocation(nullptr); + +and then emit a new location when we actually start generating code for the +body of the function: + +.. code-block:: c++ + + KSDbgInfo.emitLocation(Body); + +With this we have enough debug information to set breakpoints in functions, +print out argument variables, and call functions. Not too bad for just a +few simple lines of code! + +Full Code Listing +================= + +Here is the complete code listing for our running example, enhanced with +debug information. To build this example, use: + +.. code-block:: bash + + # Compile + clang++ -g toy.cpp `llvm-config --cxxflags --ldflags --system-libs --libs core mcjit native` -O3 -o toy + # Run + ./toy + +Here is the code: + +.. literalinclude:: ../../examples/Kaleidoscope/Chapter9/toy.cpp + :language: c++ + +`Next: Conclusion and other useful LLVM tidbits <LangImpl10.html>`_ + |