Debugging code

Debugging code is a required skill, alas it is overlooked rather often. As such, this article provides some tips and tricks to debug code in general. Additionally, assembly code debugging is highlighted, as it is related to this chapter. Some tips might sound rather obvious to some, but might not to others. Aside from that, very common tricks are often forgotten. Whenever it looks like there is a dead end, take a step backwards and see if that offers a better perspective.

Note that some of these tricks can also be applied during the debugging of malware, but there is no direct focus on malware in this chapter.

General tips and tricks

In here, general tips and tricks that are used during debugging are given, together with a small explanation on when to use this tip or trick. These tips are applicable to programs where the source may or may not be known.

Rubber Duck Debugging

The first tip is to make sure you know what the goal is. It helps to write it down or talk about it out loud, as if you were to explain it to a friend or co-worker who has never seen the code before. This method is known as Rubber Duck Debugging and has proven to be rather useful.

Stay critical

Quite often, debugging is about finding the small mistakes. The project breaking mistakes are easy to find out, but a for-loop that iterates too many times can be hard to spot. As such, it is important not to assume anything. Pretend that it is not your code and be critical. Aside from finding the mistake, this often results in higher quality code, which is also less prone to contain bugs.

Scoping the search

Aside from knowing what the goal is, one should also know what to look for. If the given exception is caused by an illegal type confusion, the search is already narrowed down to a specific part of the code. Using the graph view in any analysis program, one can identify points where the crash most likely occurs. If this part is unclear, one can look for points where the crash will certainly not occur. This decreases the amount of code that needs to looked at, whilst remaining critical regarding any other assumption that could be made.

Generally, there are two pain points: arguments and the function flow within the program. The arguments are often causing problems because the order is different than expected or the assumed value differs from the actual value. When analysing code that does not directly use types, make sure that the (implicit) return type of a function is known.
An example in Python would be the difference between an array or a tuple. Both may look like a list, but are actually completely different aspects.

Patience goes beyond knowledge

Similar to the proofreading of an article, it is often easier to debug code that you have not been staring at for the past few hours. Taking a lengthy break improves the efficiency, even if the deadline is near and you really want to find out how the function works.

Note that this tip may be short, but is one of the most effective tips in the whole list, if used at appropriate times.

Work smart

When searching for the missing piece of the puzzle, one does not need to analyse the whole puzzle. Simply looking at the pieces that surround the missing one, already provides a ton of valuable information that often leads to the correct piece.

To do the same with a program, one should first know the general flow of the application. This way, the architectural design choices follow logically, if the program is set-up that way. Sometimes the programmer wasn’t following any design paradim and decided to just wing it. In either case, it is good to know when to step into a function and when to step over a function. An example is given below using pseudo code.

boolean result = addUser(user);
if(result) {
    User verifyUser = userDao.getUser(user.getId());
    if(verifyUser.compare(user)) {
        return false;
    }
}
return result;

At first, a user is added. If the addition is successful, the user is retrieved from the database and the two instances are compared. If they’re not equal, or if the addition failed in the first place, the return value is false. Otherwise, the return value is true and one can assume that the user was added successfully.

Debugging this method can be done by stepping into every function that is encountered to see what happens under the hood. This consumes time whilst there are still unknown factors. The biggest question in this case, is to ask where exactly the error occurs. The program may crash at a given instruction or line of code, but the reason for this, is often on a different line that is located slightly before the crash. As such, one can simply put a break point on the last line of the given pseudo code to see what the return value is.

If it is true, neither the addUser function, nor the database access object is causing the error. If the return value is false, one can first step over the functions to see what value is returned. By seeing that the user is not successfully added, it follows logical that the comparison later on, returns the value false.

Lastly, the investigation of the database access function remains. If the error occurs here, it is important to check both the access function and the addUser function, in this specific order. The addition of the user might be successful, whilst a single field of the class is not stored in the database. Due to the missing data, retrieving this user from the database will never result in an object with the same values as the original object.

With the approach that is described above, the analysis is done in an efficient manner. Not all cases that are encountered, consist of clean code, but trying to abstract code into examples such as the one above, will make for a speedy analysis that is less prone to errors.

Assembly source code debugging

When analysing the assembly code in the previous articles, one can build the binary and debug it with a debugger, such as radare2. By doing so, one has to manually compare the assembly instructions with the source code that is used, variables are referred to by their location instead of their label.

When using the GNU Debugger (GDB) to debug a program of which the source is known, the source code (and information from it) is used during the process. To use this feature, one needs to build the binary with debug symbols. When using nasm, one needs to add the -g flag. Below, the new build script is given.

nasm -g -f elf32 ./file.asm -o file.o
ld -m elf_i386 ./file.o -o ./file
./file

When starting GDB, the default assembly style is written in the AT&T format. In this course, the Intel format is used. To display the instructions in the Intel format, one needs to instruct GDB to do so. The command for GDB is given below.

set disassembly-flavor intel

To gain insight in the past, current and future instructions that are executed, one can use the assembly layout. Additional details about the registers, can be found in the register layout. The commands to select these layouts are given below.

layout asm
layout regs

To set a break point, one usually needs to specify the specific address. Because the source of this program is also available, one can also use the instruction’s line number within the source code. The code to set a break point is given below.

b [line number]
b 8 #break on line 8

To get an overview of all break points that are set in the current session, simply request the information from GDB, as can be seen below.

info break

To delete a break point, one uses the del command in combination with either the address or the line number. An example is given below.

del [break point number]
del 8 #delete the break point that is set on line number 8

Lastly, the program needs to be started, which then runs until a break point is hit or a crash occurs. To run the program, one uses the run command. Additionally, command line parameters can be passed afterwards. An example is given below.

run
run arg1

To contact me, you can e-mail me at [info][at][maxkersten][dot][nl], send me a PM on Reddit or DM me on Twitter @LibraAnalysis.