Methods and macros: the call stack

Methods and macros are used in all modern binaries. To know the difference between a method and a macro, one needs to understand each term on its own first. After the explanation of methods and macros, the working of the stack will be explained, together with calling conventions and stack frames.

Methods
A method, also referred to as a function in programming, contains code which can be executed. In C, every program starts with the main function, which can call other functions. This is, often, the first user defined function, because the application entrypoint (called entry0 in Radare2) is the first function to be executed.

An example of a small program written in C, can be seen below. This program returns 0 after the main function is called. Returning the value 0 from this function, indicates that the exit call of the program itself was normal and without any errors.

int main(int argc, char** argv) {
    return 0;
}

In practice, the difference between the entry0 and main functions can be observed with any given binary. Open a binary with Radare2, analyse it with aaaa and list all functions with the afl command. In this example, the unpatched version of PatchMe0x01.bin, from the previous practical case is used. PatchMe0x01 is a 32-bit ELF binary.

[0x08048310]> afl
0x080482a8    3 35           fcn.080482a8
0x080482e0    1 6            sym.imp.puts
0x080482f0    1 6            sym.imp.__libc_start_main
0x08048300    1 6            sub.__gmon_start_300
0x08048310    1 33           entry0
0x08048340    1 4            fcn.08048340
0x08048350    4 43           fcn.08048350
0x080483c0    3 30           entry2.fini
0x080483e0    8 43   -> 93   entry1.init
0x0804840b    1 25           sub.Try_again_40b
0x08048424    6 80           main

In the list above, one can see that the entry point of the binary is named entry0 and is also the default address that Radare2 is set to. The main function is the first user defined function, which calls the sub.Try_again_40b function. Using the pdf command, the disassembly of the entry0 is printed.

[0x08048310]> pdf
|           ;-- section..text:
|           ;-- eip:
/ (fcn) entry0 33
|   entry0 ();
|           0x08048310      31ed           xor ebp, ebp                ; [14] -r-x section size 466 named .text
|           0x08048312      5e             pop esi
|           0x08048313      89e1           mov ecx, esp
|           0x08048315      83e4f0         and esp, 0xfffffff0
|           0x08048318      50             push eax
|           0x08048319      54             push esp                    ; void *stack_end
|           0x0804831a      52             push edx                    ; func rtld_fini
|           0x0804831b      68e0840408     push 0x80484e0              ; func fini
|           0x08048320      6880840408     push 0x8048480              ; func init
|           0x08048325      51             push ecx                    ; char **ubp_av
|           0x08048326      56             push esi                    ; int argc
|           0x08048327      6824840408     push main                   ; 0x8048424 ; func main
\           0x0804832c      e8bfffffff     call sym.imp.__libc_start_main ; int __libc_start_main(func main, int argc, char **ubp_av, func init, func fini, func rtld_fini, void *stack_end)

The LibC function __libc_start_main calls the main function with the given parameters, all of which are pushed on the stack before the call is made. Upon entering the main function, all of the user’s code is executed. Depending on the permissions of the executed binary, external files can be accessed, altered and actions can be performed. This happens with any given program that is executed on a machine.

Macros
If all the functions in the binary are already covered in the paragraph regarding methods, then what are macros? Macros are a set of (multiple) instructions, which are defined by the programmer. The compiler substitutes the macro for the instruction(s) which are set in the definition during the compiling. Two examples of instructions which could be written as macros are the push and pop instruction in assembly. The push instruction sets the new value of the stack pointer (SP, ESP or RSP) and assigns the provided value to this newly assigned memory slot on the stack. The pop command pops the last added variable off the stack, which is then saved into the provided address.

The push “macro” consists of two instructions, as can be seen in the example below.

sub ESP, 4
mov [ESP], variable_name

Note that the name variable_name is a custom variable name and could also be a register (i.e. EAX or the instruction pointer, which resides in IP, EIP or RIP, depending on your architecture). The brackets around [ESP] indicate that the data at the address of ESP is accessed, instead of the address itself. The stack pointer is decreased with 4 because the return address is 32-bits in size, or 4 bytes.

Similarly, the pop “macro” is the reverse of the push “macro”. This example can be found below. In this example, the EAX register is used to store the result of the pop “macro”.

mov EAX, [ESP]
add ESP, 4

First, the value at the address of the stack pointer is copied into the accumulating register EAX. After that, the address of the stack pointer is raised with 4, due to the fact that the saved value was a double word, which equals 32-bits or 4 bytes.

Macros that are defined by the programmer, work the same as the two provided examples. The layout together with an example are provided below. The example pops three values from the stack whilst saving them in the registers EAX, EBX and ECX respectively. Note that value that was last added to the stack is found in EAX.

%macro name number_of_arguments
[instructions]
%endmacro
 
%macro value_popper 0
pop eax
pop ebx
pop ecx
%endmacro

Calling a macro is the same as calling any other instruction after it has been defined. This can be seen in the example below.

push var_a
push var_b
push var_c
value_popper

Stack
The stack grows, on x86 and x86_64, downwards. In some other architectures the stack direction can be selected by the programmer (i.e. in ARM), but the ARM Thumb-2 instruction set has set instructions for a stack that grows downwards, which makes a downwards growing stack favourable but not mandatory.

The stack can be compared to a pile of books. From this pile, one can take books or one can add books. Yet, this visualises a stack that grows upwards. Another way to visualise the stack, is the example of a school desk. Unfortunately, students always stick their old tasteless pieces of chewing gum under the table. The amount of chewing gum grows towards the floor (downwards), yet it still grows.

In both examples the order in which books or pieces of chewing gum are added or removed, are the same: Last In, First Out (LIFO). The push variable_name instruction adds a variable to the stack, whereas the pop destination_address removes the last added item from the stack and stores it at the provided address.

Calling conventions
The stack is used to save variables for later usage. Since the variables are always stored in the same manner (LIFO), it begs the question if a function call with 3 parameters should look like the first example or the second example. There are a lot of calling conventions out there and nearly all of them differ at least slightly. There is no point in learning all the calling conventions by heart, but knowing they exist comes in handy during the analysis of binaries: one needs to be able to recognise the difference and understand the differences. This paragraph is therefore not dedicated to explain which calling convention uses what structure, but instead focuses on the differences in architecture.

[Example 1]
push var_a
push var_b
push var_c
call func_a
 
[Example 2]
push var_c
push var_b
push var_a
call func_a

Neither of these two are wrong, since different calling conventions prescribe different standards. Not only the order in which variables are stored and read from the stack, also the clean-up after a function has been called differs. Below are two examples, the first being a caller clean-up and the second being a callee clean-up. In the caller clean-up, the stack is not cleaned within the function but after, whereas the callee clean-up, cleans the stack on the return instruction.

[Caller clean-up]
push EBP
mov EBP, ESP
sub ESP, total_value
[function content]
mov ESP, EBP
pop EBP
ret 0
 
[Callee clean-up]
push EBP
mov EBP, ESP
sub ESP, total_value
[function content]
mov ESP, EBP
pop EBP
ret total_value

Note that the variable total_value equals the total size of all local variables in the function. A local variable is a variable that is declared within a function and cannot be used outside said function. An example in C is given below.

void function() {
    int a = 1;
    int b = 2;
}
 
int main(int argc, char** argv) {
    function();
    return 0;
}

The total size of the local variables in the method called function on a 32-bit binary, would equal two times the size of an integer. The size of an integer is 4 bytes, making the total amount 8.

Saving registers
A register can only contain a single value at any given moment. To preserve the values in registers, the stack is used. Caller saved registers are registers (such as EAX, ECX and EDX) should be saved by the caller if the value is of importance. There is no guarantee that the values will be the same after the called function returns.
The callee saved registers should be the same when the called function returns, as it is the responsibility of the called function (the callee) to save these values on the stack and restore them before returning.

Stack frames
A stack frame generally starts off with the same instructions, which can be found below.

push EBP
mov EBP, ESP
sub ESP, total_variable_size

Note that the variable total_variable_size is a placeholder in this example.

The base pointer (BP, EBP or RBP), equals the base for the previous function and should be preserved, which is done with the push instruction. After the value of EBP is saved, the register can freely be used. The stack pointer (SP, ESP or RBP) is then stored in the base pointer, so it points to the top of the stack (relative to this function). The address which is resides in the stack pointer is then subtracted (since the stack grows downwards) by the total size of all local variables.

Leaving the frame is the opposite of the instructions that are used to create a stack frame, as can be seen in the example below.

mov ESP, EBP
pop EBP
ret

The base pointer of the current function points to the current place in the stack, and is thus saved in the stack pointer. The old value of the base pointer was saved in the beginning, and is now popped off the stack into the base pointer register. After that, the function returns where all of the registers that should be preserved, are preserved. Documentation of a function provides information about the way the return value is provided, although this generally is the accumulating register (AX, EAX or RAX). If the return value is bigger than the size of the register (i.e. more than 32-bits on a 32-bit architecture), the return value can be a pointer to a data structure or multiple registers can be used. This depends on the calling convention that is used and the size, since two 32-bit registers can store a maximum value of 64-bit. If the return value is even more than the combined size of the additional registers, a data structure has to be used.

The next article regarding the “Practical case: Buffer Overflow 0x01” can be found here!.


To contact me, you can e-mail me at [info][at][maxkersten][dot][nl], send me a PM on Reddit or DM me on Twitter @LibraAnalysis.