In the previous articles we have extracted the debug sections of the WASM binary and started interpreting the DWARF data in them. We extracted the compilation unit and the associated Debug Information Entry (DIE).
In this article, we will see how we can interpret the data present in the .debug_line section. It is important to keep in mind that using as little space as possible is one of the main design goal of DWARF. That is why the way the association between the source location and the compiled code is encoded the way it is in DWARF. In order, for your debugger, to know where in the code the current program counter register points to, a correspondence between the instruction address and the source location must be encoded in the .debug_line. The data is not encoded using what we might call a format. It is actually encoded as a program. We will create a virtual machine composed of various register and read binary code which represent instructions, each of those instruction acting on the state of the virtual machine register. Occasionally, this VM will generate an entry in a matrix that will match instruction code of the original compiled program to the corresponding source location.
The virtual machine is composed of the following "registers":
They are described in details in DWARF Debugging Information Format version 4 chapter 6.2.2. The whole purpose of the program will be to change the virtual machine state and generate a correspondence matrix using those registers. Every once in a while, the registers will be dumped in a stack and this will identify another link between the instructions and the source program.
This virtual machine will read its instruction from the .debug_line section. The instruction set is pretty simple. I will not list them all here as you can find them in Chapter 6.2.5 of the aforementioned document. The instructions are of 3 different types:
They are only represented by their opcode as an unsigned byte. They have no operands. Each time a special opcode is read, the VM will create an entry in the correspondence matrix. The opcode will also have some effect on the VM state see 6.2.5.1..
As we have seen, there are twelve of them. Their opcode is represented by an unsigned byte. The opcode can be followed by zero, one or multiple operands encoded in LEB128. One example of instruction is DW_LNS_advance_line which opcode is 0x0C and takes one LEB128 operand which must be added to the line register of the VM.
The extended opcode have their first byte set to 0 and their opcode, starting on the second byte is encoded in LEB128. There are 4 described in the version 4 of the standard but due to their encoding they can be many future extension without breaking backward compatibility.
Like the custom sections we have covered in the previous article, the .debug_line section first starts with 0x00, the length of the section, the length of the section name and finally the name. The next byte will correspond to index 0x00 of the line program header. This is important because each compilation unit has a field called DW_AT_stmt_list which is one of the tags we read when we decoded the CU.
0x0000000b: DW_TAG_compile_unit
DW_AT_producer ("zig 0.10.0")
DW_AT_language (DW_LANG_C99)
DW_AT_name ("main")
DW_AT_stmt_list (0x00000000) <----- here
DW_AT_comp_dir (".")
DW_AT_GNU_pubnames (true)
DW_AT_low_pc (0x00000000)
DW_AT_ranges (0x00000000
[0x00000003, 0x000000d5)
[0x000000d6, 0x000000de))
This field will reference an offset in the .debug_line section indicating
the start of the line program corresponding to the CU. The offset starts
right after the section name.
In our example here, the statement list starts at 0x00. The first thing we will read is the line program header. The header determine some parameters and the initial state of the VM. The header is composed as follow:
Reading this code until the end of the unit (whose length is given in the header) will ensure the VM state is changed appropriately and the correspondence is generated. You will end up with a matrix that will look like this:
Address Line Column File ISA Discriminator Flags ------------------ ------ ------ ------ --- ------------- ------------- 0x0000000000000003 3 0 5 0 0 is_stmt 0x0000000000000025 4 3 5 0 0 is_stmt prologue_end 0x0000000000000098 0 3 5 0 0 0x0000000000000099 4 3 5 0 0 0x00000000000000b6 0 3 5 0 0 0x00000000000000b7 4 3 5 0 0 0x00000000000000d5 4 3 5 0 0 end_sequence 0x00000000000000d6 767 0 1 0 0 is_stmt 0x00000000000000d7 788 17 1 0 0 is_stmt prologue_end 0x00000000000000dc 0 17 1 0 0 0x00000000000000de 0 17 1 0 0 end_sequence
You can see here that in file 5 (which happens to be main.zig) line 4 and column 3 corresponds to the instruction at address 0x0025. Pretty straightforward.
This article did not go into all the gory details of the line number program virtual machine instruction set but the DWARF standard does a pretty good job at that. In the next article, we will finally see how we can use this information to generate the inline sourcemap to be added to the WASM buffer and enable the browser's debugger.