Crash course

In this crash course, the most common instructions in the Intel x86 and x86_64 instruction set will be discussed. Additionally, this article will focus on recognising patterns, which is often overlooked. Seeing the intention of the author through the code, comes with experience. A blog or a few articles can only do so much, one needs to involve him/herself in practical examples and pick additional challenges to get the required hands-on experience.

Personally, I believe that people learn more when they are provided with both the knowledge regarding technical aspects and the experience from the teacher. An excellent reverse engineer is not only good in what (s)he does because of the in-depth technical knowledge, but also because of the experience (s)he has.

Common instructions

Below, common instructions will be explained. Where possible, the implementation in another language is also provided, together with an example in assembly.

cmp
The compare instruction compares two values. Neither of the values is altered in the process, but the required flag is set in the flags register. An example is given below.

cmp var1, var2
cmp eax, 7
cmp eax, var1

Jumps
Jumps set the instruction pointer to the provided operand and jump to it afterwards, as is shown in the example below.

jmp 0x00500

Besides unconditional jumps (like the one shown in the example above), there are other types of jumps. The most commonly used jumps are listed below. Note that there are more jump instructions than are listed here. Using the Intel Instruction Set Manual, one can find all jump instructions. A mnemonic to remind yourself of unknown jump instructions, is to check every instruction which starts with a j!

je stands for jump equal, which only jumps if the previously compared values were equal.
jz stands for jump zero, which only jumps if the previously compared values were zero.
jne stands for jump not equal, which only jumps if the previously compared values were not equal to each other.
jnz stands for jump not zero, which only jumps if the previously compared values were not equal to zero.
jnb stands for jump not below, which only jumps if the previously compared value was not below the second operand.
jge stands for jump greater or equal, which only jumps if the previously compared values were greater or equal to zero.
jnl stands for jump not less, which only jumps if the previously compared value was not less than the second operand.

Like the example above, every jump instruction takes only one argument: the address to jump to if the required condition is met.

mov
Move data in the destination from the source. The destination and/or the source can be a register, a variable or a constant value. Addresses can also be added together, which can be seen in the example. The first line of the example shows the order of the parameters.

mov dest, src
 
mov eax, edx
mov eax, [edx+ebx]
mov eax, 3
mov var, 1

add
The addition of two integers, which can reside in registers, variables or be a constant value. The result is stored in the first parameter. The equivalent of this instruction in C is given below.

int value1 = 5;
int value2 = 5;
value1 = value1 + value2;

Possible usages in assembly are listed below. The first line shows the order of the parameters.

add value1, value2
 
add eax, 5
add eax, var
add var, ebx
add var, 3
add eax, ecx

sub
The subtraction command subtracts one value from another. The format is the same as in the add instruction: the second parameter is subtracted from the first parameter, after which the result is stored in the first parameter. The equivalent of this instruction in C is given below.

int value1;
int value2;
value1 = value1 - value2;

The possible usages in assembly are listed below. The first example displays the order of the parameters.

sub value1, value2
 
sub eax, ecx
sub eax, 3
sub var, 8
sub eax, var
sub var, eax

inc
Increments a value with one, which is often seen in a loop. The provided parameter can be a register or a variable but not a constant. The equivalent of this instruction in C will be given.

int i = 0;
i = i + 1;
//or
i++;

The possible options in assembly are listed below. The first example gives insight in the argument placement.

inc var
 
inc eax
inc var

Since this instruction always increments the parameter with 1, only a single parameter is required.

dec
Decrements a value with one, might also be used in a loop. The provided parameter can be a register or a variable, but not a constant. The equivalent of this instruction in C will be given below.

int i = 1;
i = i - 1;
//or
i--;

The possible options in assembly are listed below, in which the first example shows the argument placement.

dec var
 
dec eax
dec var

Opposite to the inc instruction, this instruction always decrements with one. Therefore, like the inc instruction, it requires only a single parameter.

push
The push instruction saves the pushed value to the stack. The equivalent of this instruction will be shown in Java.
Note that whilst using the existing ArrayList class together with a template (which is resembled using T in Java) would result in a better implementation, it is out of scope in this example.

List<Object> stack = new ArrayList<>();
 
public void push(Object object) {
	stack.add(object);
}

In assembly, this instruction requires only a single parameter, which can be a register, variable or constant. The value of the register, variable or constant is pushed on the stack, not the address. Note that the value of a register or variable, can be an address if it is a pointer.

push var

pop
This instruction is used to remove the value which is on top of the stack and save it in the provided location. This location can be a register or variable. Note that it can not be a constant, since one cannot store 5 in 10. First, the Java equivalent of this instruction is given.

List<Object> stack = new ArrayList<>();
 
public Object pop() {
	Object object = stack.get(stack.size() - 1);
	stack.remove(stack.size() - 1);
	return object;
}

Examples on how to use this instruction in assembly, are given below.

pop eax
pop var

call
The call instruction calls a function. All the function parameters are set in the correct places before the call is made. On the x86 platform, all function parameters are pushed on the stack. On x86_64 (and depending on the calling convention) the first few arguments are passed via registers. Additional parameters are pushed on the stack before the call instruction is executed. Below is the equivalent of this instruction in C.

foo(5, 10);

In assembly, there are two different outcomes due to different possible calling conventions. Both are given in the example below.

push 5
push 10
call foo
 
push 10
push 5
call foo

ret
The return instruction is used as the last instruction in a function and returns the instruction pointer (IP, EIP or RIP) to the value it had before the function was called plus one. It then jumps to the location of this instruction, causing the program to resume its execution after the function has been finished. The return value should be saved in the accumulating register (AX, EAX or RAX). In case the return value is bigger than the size of the accumulating register, a pointer is returned, at which the returned data then resides.

Additionally, depending on the calling convention, one can provide a parameter with the return instruction, which causes the given amount of bytes to be removed from the stack.

ret
ret 10

lea
The lea instruction stands for Load Effective Address, which loads the address of a provided variable into a register. The notation in C is given below.

int value = 1;
int *pointer = &value;

The assembly example is given below.

lea eax, var
lea eax, [ebp + var]

and
Performs a bitwise logical AND operation on both operands, after which the result is stored in the first parameter. The C equivalent of this instruction is given below.

int x = 3;
int y = 4;
x = x&y;

The usage in assembly is partially restricted: the first operand may not be a constant, since the result of the operation is stored in the first operand.

and eax, ecx
and eax, 6
and eax, var

test
The test instruction sets the zero flag if the result of a bitwise logical AND operation with both operands equals 0. The result is zero if either one of the parameters, or both, equal zero. None of the provided operands are altered using this instruction.
The zero flag is used to decide whether or not a jump should be taken, depending on the specific jump instruction.

test eax, eax
test eax, 4
test eax, var

xor
The logical bitwise exclusive or is often used for symmetric encryption, or to set the value of a variable or register to zero. This can be done when the two parameters are the same register or variable. The result is saved in the first operand. First, an example in C is given.

int x = 5;
int y = 7;
x = x^y;

Below is the example for assembly.

xor var1, var2
xor eax, var1
xor edx, edx

Additional research

When disassembling a binary, other instructions will be encountered. To find out what an instruction stands for, and what the parameters resemble, one should use the Intel Instruction Set Reference Manual. In here, all x86 and x86_64 instructions are documented.

To avoid manually looking up less common instructions, Radare2 offers multiple features. To see the description of instructions on the right side of the terminal, one should set the variable named asm.describe to true. The command is given below.

e asm.describe = true

Additionally, or alternatively, one can also use pseudo assembly. This removes certain instructions and displays the information in pseudo code-like manner. To enable this feature, one has to set the asm.pseudo variable to true. Below, the command is given.

e asm.pseudo = true

Using a small example, all methods will be displayed to provide a clear overview. The displayed disassembly is given in x86_64 with the Intel syntax from a Mach-O binary, starting off with the default disassembly that Radare2 provides using the pdf command.

           0x100000f20      55             push rbp
           0x100000f21      4889e5         mov rbp, rsp
           0x100000f24      bf01000000     mov edi, 1
           0x100000f29      e8a2ffffff     call sym._foo
           0x100000f2e      31c0           xor eax, eax
           0x100000f30      5d             pop rbp
           0x100000f31      c3             ret

Setting the describe variable to true, the output also contains the explanation for the instructions.

           0x100000f20      55             push rbp                   ; push word, doubleword or quadword onto the stack
           0x100000f21      4889e5         mov rbp, rsp               ; moves data from src to dst
           0x100000f24      bf01000000     mov edi, 1                 ; moves data from src to dst
           0x100000f29      e8a2ffffff     call sym._foo              ; calls a subroutine, push eip into the stack (esp)
           0x100000f2e      31c0           xor eax, eax               ; logical exclusive or
           0x100000f30      5d             pop rbp                    ; pops last element of stack and stores the result in argument
           0x100000f31      c3             ret                        ; return from subroutine. pop 4 bytes from esp and jump there.

After setting the pseudo variable to true, the output is now simplified. An example is the move instruction, which is now displayed as x = y instead of mov x, y.

           0x100000f20      55             push rbp
           0x100000f21      4889e5         rbp = rsp
           0x100000f24      bf01000000     edi = 1
           0x100000f29      e8a2ffffff     sym._foo ()
           0x100000f2e      31c0           eax = 0
           0x100000f30      5d                 
           0x100000f31      c3             return eax

Using both settings at the same time, the view is simplified and explained at the same time. Note that some instructions are left out, such as the pop instruction at the end of the function. This is due to the asm.pseudo.

           0x100000f20      55             push rbp                   ; push word, doubleword or quadword onto the stack
           0x100000f21      4889e5         rbp = rsp                  ; moves data from src to dst
           0x100000f24      bf01000000     edi = 1                    ; moves data from src to dst
           0x100000f29      e8a2ffffff     sym._foo ()                ; calls a subroutine, push eip into the stack (esp)
           0x100000f2e      31c0           eax = 0                    ; logical exclusive or
           0x100000f30      5d                                        ; pops last element of stack and stores the result in argument
           0x100000f31      c3             return eax                 ; return from subroutine. pop 4 bytes from esp and jump there.

Recognising patterns

Knowing what instruction results in what, is required to understand what the application is doing. Obviously, one can write down what the function does, one instruction at a time and then summarise the lot, function after function. Expecting what to look for speeds this process up a lot, albeit not always being correct. If the assumption that was made at the start is incorrect, one has to reevaluate the function’s content.

Loops
An example is a for-loop, or a while-loop, in C, both are given below. The two loops are compiled to exactly the same ELF binary with GCC. First, the for-loop and the while-loop are given, after which the disassembled code is given.

#include <stdio.h>
 
int main(int argc, char **argv) {
    for (int i = 0; i < 10; i++) {
        printf("Current count equals: %d\n", i);
    }
}
#include <stdio.h>
 
int main(int argc, char **argv) {
    int i = 0;
    while (i < 10) {
        printf("Current count equals: %d\n", i);
        i++;
    }
}

The output of the disassembly of the main function with Radare2 is given below.

/ (fcn) main 68
|   main (int argc, char **argv, char **envp);
|           ; var signed int local_ch @ ebp-0xc
|           ; var int local_4h @ ebp-0x4
|           ; arg int arg_4h @ esp+0x4
|           ; DATA XREF from entry0 (0x8048327)
|           0x0804840b      8d4c2404       lea ecx, [arg_4h]           ; 4
|           0x0804840f      83e4f0         and esp, 0xfffffff0
|           0x08048412      ff71fc         push dword [ecx - 4]
|           0x08048415      55             push ebp
|           0x08048416      89e5           mov ebp, esp
|           0x08048418      51             push ecx
|           0x08048419      83ec14         sub esp, 0x14
|           0x0804841c      c745f4000000.  mov dword [local_ch], 0
|       ,=< 0x08048423      eb17           jmp 0x804843c
|       |   ; CODE XREF from main (0x8048440)
|      .--> 0x08048425      83ec08         sub esp, 8
|      :|   0x08048428      ff75f4         push dword [local_ch]
|      :|   0x0804842b      68d0840408     push str.Current_count_equals:__d ; 0x80484d0 ; "Current count equals: %d\n" ; const char *format
|      :|   0x08048430      e8abfeffff     call sym.imp.printf         ; int printf(const char *format)
|      :|   0x08048435      83c410         add esp, 0x10
|      :|   0x08048438      8345f401       add dword [local_ch], 1
|      :|   ; CODE XREF from main (0x8048423)
|      :`-> 0x0804843c      837df409       cmp dword [local_ch], 9     ; [0x9:4]=-1 ; 9
|      `==< 0x08048440      7ee3           jle 0x8048425
|           0x08048442      b800000000     mov eax, 0
|           0x08048447      8b4dfc         mov ecx, dword [local_4h]
|           0x0804844a      c9             leave
|           0x0804844b      8d61fc         lea esp, [ecx - 4]
\           0x0804844e      c3             ret

Especially the part which is annotated with the ASCII arrows by Radare2 is of importance. This part is highlighted below.

|       ,=< 0x08048423      eb17           jmp 0x804843c
|       |   ; CODE XREF from main (0x8048440)
|      .--> 0x08048425      83ec08         sub esp, 8
|      :|   0x08048428      ff75f4         push dword [local_ch]
|      :|   0x0804842b      68d0840408     push str.Current_count_equals:__d ; 0x80484d0 ; "Current count equals: %d\n" ; const char *format
|      :|   0x08048430      e8abfeffff     call sym.imp.printf         ; int printf(const char *format)
|      :|   0x08048435      83c410         add esp, 0x10
|      :|   0x08048438      8345f401       add dword [local_ch], 1
|      :|   ; CODE XREF from main (0x8048423)
|      :`-> 0x0804843c      837df409       cmp dword [local_ch], 9     ; [0x9:4]=-1 ; 9
|      `==< 0x08048440      7ee3           jle 0x8048425

One of the first signs that a loop is used here, is the fact that an unconditional jump is taken to an address where a compare instruction resides. This function compares if the variable local_ch (which is named i in the original source code) equals 9. If the variable is less or equal to 9, it jumps back to the instruction which prepares the function call to printf by pushing two variables on the stack.

Another sign that a loop is used in this function, is the jump upwards in the code. This happens to repeat code without including it multiple times in a binary.

If-else-statement
An if-else-statement always has multiple branches leading to the final part of the code. In the code example below, every part of the if-else-statement eventually reaches the return 0 part of the code. If a function contains more code than the branches of the if-else-statement, it provides a lot of knowledge to know which part is always executed and which part is only executed based upon the provided input.

#include <stdio.h>
 
int main(int argc, char **argv) {
    int x = 0;
    int y = 1;
 
    if (x == y) {
        printf("X equals Y!\n");
    } else if (x * 2 == y) {
        printf("X is half of Y!\n");
    } else if (x == 2 * y) {
        printf("X is twice as big as Y!\n");
    } else {
        printf("X and Y are not equal!\n");
    }
    return 0;
}

As can be seen in the disassembly below, all the arrows point towards the end of the function, indicating that there are multiple jumps which lead to the same code. This indicates multiple different conditions, which split the code into branches. To view these branches, one can use the VV command. This provides an overview of the instructions, grouped in branches with a green and red line going from one branch to another. The green line resembles the true branch, whereas the red line resembles the false branch.

/ (fcn) main 142
|   main (int argc, char **argv, char **envp);
|           ; var unsigned int local_10h @ ebp-0x10
|           ; var unsigned int local_ch @ ebp-0xc
|           ; var int local_4h @ ebp-0x4
|           ; arg int arg_4h @ esp+0x4
|           ; DATA XREF from entry0 (0x8048327)
|           0x0804840b      8d4c2404       lea ecx, [arg_4h]           ; 4
|           0x0804840f      83e4f0         and esp, 0xfffffff0
|           0x08048412      ff71fc         push dword [ecx - 4]
|           0x08048415      55             push ebp
|           0x08048416      89e5           mov ebp, esp
|           0x08048418      51             push ecx
|           0x08048419      83ec14         sub esp, 0x14
|           0x0804841c      c745f0000000.  mov dword [local_10h], 0
|           0x08048423      c745f4010000.  mov dword [local_ch], 1
|           0x0804842a      8b45f0         mov eax, dword [local_10h]
|           0x0804842d      3b45f4         cmp eax, dword [local_ch]
|       ,=< 0x08048430      7512           jne 0x8048444
|       |   0x08048432      83ec0c         sub esp, 0xc
|       |   0x08048435      6820850408     push str.X_equals_Y         ; 0x8048520 ; "X equals Y!" ; const char *s
|       |   0x0804843a      e8a1feffff     call sym.imp.puts           ; int puts(const char *s)
|       |   0x0804843f      83c410         add esp, 0x10
|      ,==< 0x08048442      eb48           jmp 0x804848c
|      ||   ; CODE XREF from main (0x8048430)
|      |`-> 0x08048444      8b45f0         mov eax, dword [local_10h]
|      |    0x08048447      01c0           add eax, eax
|      |    0x08048449      3b45f4         cmp eax, dword [local_ch]
|      |,=< 0x0804844c      7512           jne 0x8048460
|      ||   0x0804844e      83ec0c         sub esp, 0xc
|      ||   0x08048451      682c850408     push str.X_is_half_of_Y     ; 0x804852c ; "X is half of Y!" ; const char *s
|      ||   0x08048456      e885feffff     call sym.imp.puts           ; int puts(const char *s)
|      ||   0x0804845b      83c410         add esp, 0x10
|     ,===< 0x0804845e      eb2c           jmp 0x804848c
|     |||   ; CODE XREF from main (0x804844c)
|     ||`-> 0x08048460      8b45f4         mov eax, dword [local_ch]
|     ||    0x08048463      01c0           add eax, eax
|     ||    0x08048465      3b45f0         cmp eax, dword [local_10h]
|     ||,=< 0x08048468      7512           jne 0x804847c
|     |||   0x0804846a      83ec0c         sub esp, 0xc
|     |||   0x0804846d      683c850408     push str.X_is_twice_as_big_as_Y ; 0x804853c ; "X is twice as big as Y!" ; const char *s
|     |||   0x08048472      e869feffff     call sym.imp.puts           ; int puts(const char *s)
|     |||   0x08048477      83c410         add esp, 0x10
|    ,====< 0x0804847a      eb10           jmp 0x804848c
|    ||||   ; CODE XREF from main (0x8048468)
|    |||`-> 0x0804847c      83ec0c         sub esp, 0xc
|    |||    0x0804847f      6854850408     push str.X_and_Y_are_not_equal ; 0x8048554 ; "X and Y are not equal!" ; const char *s
|    |||    0x08048484      e857feffff     call sym.imp.puts           ; int puts(const char *s)
|    |||    0x08048489      83c410         add esp, 0x10
|    |||    ; CODE XREFS from main (0x8048442, 0x804845e, 0x804847a)
|    ```--> 0x0804848c      b800000000     mov eax, 0
|           0x08048491      8b4dfc         mov ecx, dword [local_4h]
|           0x08048494      c9             leave
|           0x08048495      8d61fc         lea esp, [ecx - 4]
\           0x08048498      c3             ret

The different branches are highlighted again in the shortened disassembly code below.

|           0x0804842d      3b45f4         cmp eax, dword [local_ch]
|       ,=< 0x08048430      7512           jne 0x8048444
|       |   0x08048432      83ec0c         sub esp, 0xc
|       |   0x08048435      6820850408     push str.X_equals_Y         ; 0x8048520 ; "X equals Y!" ; const char *s
|       |   0x0804843a      e8a1feffff     call sym.imp.puts           ; int puts(const char *s)
|       |   0x0804843f      83c410         add esp, 0x10
|      ,==< 0x08048442      eb48           jmp 0x804848c
|      ||   ; CODE XREF from main (0x8048430)
|      |`-> 0x08048444      8b45f0         mov eax, dword [local_10h]
|      |    0x08048447      01c0           add eax, eax
|      |    0x08048449      3b45f4         cmp eax, dword [local_ch]
|      |,=< 0x0804844c      7512           jne 0x8048460
|      ||   0x0804844e      83ec0c         sub esp, 0xc
|      ||   0x08048451      682c850408     push str.X_is_half_of_Y     ; 0x804852c ; "X is half of Y!" ; const char *s
|      ||   0x08048456      e885feffff     call sym.imp.puts           ; int puts(const char *s)
|      ||   0x0804845b      83c410         add esp, 0x10
|     ,===< 0x0804845e      eb2c           jmp 0x804848c
|     |||   ; CODE XREF from main (0x804844c)
|     ||`-> 0x08048460      8b45f4         mov eax, dword [local_ch]
|     ||    0x08048463      01c0           add eax, eax
|     ||    0x08048465      3b45f0         cmp eax, dword [local_10h]
|     ||,=< 0x08048468      7512           jne 0x804847c
|     |||   0x0804846a      83ec0c         sub esp, 0xc
|     |||   0x0804846d      683c850408     push str.X_is_twice_as_big_as_Y ; 0x804853c ; "X is twice as big as Y!" ; const char *s
|     |||   0x08048472      e869feffff     call sym.imp.puts           ; int puts(const char *s)
|     |||   0x08048477      83c410         add esp, 0x10
|    ,====< 0x0804847a      eb10           jmp 0x804848c
|    ||||   ; CODE XREF from main (0x8048468)
|    |||`-> 0x0804847c      83ec0c         sub esp, 0xc
|    |||    0x0804847f      6854850408     push str.X_and_Y_are_not_equal ; 0x8048554 ; "X and Y are not equal!" ; const char *s
|    |||    0x08048484      e857feffff     call sym.imp.puts           ; int puts(const char *s)
|    |||    0x08048489      83c410         add esp, 0x10
|    |||    ; CODE XREFS from main (0x8048442, 0x804845e, 0x804847a)
|    ```--> 0x0804848c      b800000000     mov eax, 0

It is important to keep track of the type of jump that is used. In this case the only types of jumps that are used are the unconditional jump (jmp) and the jump not equal (jne). With negative jumps, such as the jump not equal, one tends to easily mistake the true branch for equal values, whereas the opposite is true: the jump is only taken when the compared values are not equal.

Tunnel vision

Do not trust a pattern from the get-go, since there might be more to it than meets the eye. Compilers might have optimised the code in a manner humans perceive as strange or a malicious actor might have used multiple techniques to disguise the true intentions of the code. Some of these tricks are quickly seen through when dynamically analysing the code, but that is not always possible or required. In the example below, a bit of pseudo x86 assembly is given.

mov var_a, 1
push var_a
mov [esp - 4], 13
call funcX

Firstly, a value is assigned to a variable, which is then pushed atop of the stack. Seeing the function call, it is obvious that the pushed value is the sole parameter for this function. The pseudo disassembly of funcX is provided below. For the sake of this example, assume that the call funcRetry instruction resides at A and that the pop ebp instruction resides at B.

push ebp
mov ebp, esp
cmp [ebp - 8], 0xd
jne A
call funcWin
jmp B
call funcRetry 		;resides at A
pop ebp       		;resides at B
ret

In short, this function sets up its stack frame and compares the parameter on the stack (which now is the last parameter to be pushed on the stack) to 0xd, or 13 in decimal. This will always result in a jump to funcRetry, since the given value to funcX equals 1. But, is this true? Take another look at the pseudo x86 assembly of funcX, which is given below.

mov var_a, 1
push var_a
mov [esp - 4], 13
call funcX

The variable var_a is set to the value 1 and pushed on the stack. Directly after that, the value which is pushed on the stack is altered to 13. Since it is a 32-bit application, the size of an integer is 4 bytes and after the push instruction, the stack pointer, ESP, points to the top of the stack. The value at the location of ESP minus 4 bytes equals the value that was pushed on the stack. The mov instruction set the value to 0xd (13 in decimal) instead of 1, which changes the outcome of the jump within funcX, since it now always executes funcWin instead of funcRetry.

This is just one of the many examples in which a pattern may seem obvious, but differs in reality. If the alteration of the value on the stack was obvious, make sure not to make the mistake that all patterns are obvious. Especially malware contains illogical patterns which tend to misuse the instruction set, alas it is not solemnly reserved for malicious software.

The next chapter, “Binary types”, can be found here!.


To contact me, you can e-mail me at [info][at][maxkersten][dot][nl], send me a PM on Reddit or DM me on Twitter @LibraAnalysis.