This article was published on the 22nd of August 2018. This article was updated on the 30th of April 2020.
Methods and macros are used in all modern binaries. To know the difference between a method and a macro, one needs to understand each term on its own first. After the explanation of methods and macros, the working of the stack will be explained, together with calling conventions and stack frames.
Table of contents
Methods
A method, also referred to as a function in programming, contains code which can be executed. In C, every program starts with the main function, which can call other functions. This is, often, the first user defined function, because the application entrypoint (called entry0 in Radare2) is the first function to be executed.
An example of a small program written in C, can be seen below. This program returns 0 after the main function is called. Returning the value 0 from this function, indicates that the exit call of the program itself was normal and without any errors.
int main(int argc, char** argv) { return 0; }
In practice, the difference between the entry0 and main functions can be observed with any given binary. Open a binary with Radare2, analyse it with aaaa and list all functions with the afl command. In this example, the unpatched version of PatchMe0x01.bin, from the previous practical case is used. PatchMe0x01 is a 32-bit ELF binary.
[0x08048310]> afl 0x080482a8 3 35 fcn.080482a8 0x080482e0 1 6 sym.imp.puts 0x080482f0 1 6 sym.imp.__libc_start_main 0x08048300 1 6 sub.__gmon_start_300 0x08048310 1 33 entry0 0x08048340 1 4 fcn.08048340 0x08048350 4 43 fcn.08048350 0x080483c0 3 30 entry2.fini 0x080483e0 8 43 -> 93 entry1.init 0x0804840b 1 25 sub.Try_again_40b 0x08048424 6 80 main
In the list above, one can see that the entry point of the binary is named entry0 and is also the default address that Radare2 is set to. The main function is the first user defined function, which calls the sub.Try_again_40b function. Using the pdf command, the disassembly of the entry0 is printed.
[0x08048310]> pdf | ;-- section..text: | ;-- eip: / (fcn) entry0 33 | entry0 (); | 0x08048310 31ed xor ebp, ebp ; [14] -r-x section size 466 named .text | 0x08048312 5e pop esi | 0x08048313 89e1 mov ecx, esp | 0x08048315 83e4f0 and esp, 0xfffffff0 | 0x08048318 50 push eax | 0x08048319 54 push esp ; void *stack_end | 0x0804831a 52 push edx ; func rtld_fini | 0x0804831b 68e0840408 push 0x80484e0 ; func fini | 0x08048320 6880840408 push 0x8048480 ; func init | 0x08048325 51 push ecx ; char **ubp_av | 0x08048326 56 push esi ; int argc | 0x08048327 6824840408 push main ; 0x8048424 ; func main \ 0x0804832c e8bfffffff call sym.imp.__libc_start_main ; int __libc_start_main(func main, int argc, char **ubp_av, func init, func fini, func rtld_fini, void *stack_end)
The LibC function __libc_start_main calls the main function with the given parameters, all of which are pushed on the stack before the call is made. Upon entering the main function, all of the user’s code is executed. Depending on the permissions of the executed binary, external files can be accessed, altered and actions can be performed. This happens with any given program that is executed on a machine.
Macros
If all the functions in the binary are already covered in the paragraph regarding methods, then what are macros? Macros are a set of (multiple) instructions, which are defined by the programmer. The compiler substitutes the macro for the instruction(s) which are set in the definition during the compiling. Two examples of instructions which could be written as macros are the push and pop instruction in assembly. The push instruction sets the new value of the stack pointer (SP, ESP or RSP) and assigns the provided value to this newly assigned memory slot on the stack. The pop command pops the last added variable off the stack, which is then saved into the provided address.
The push “macro” consists of two instructions, as can be seen in the example below.
sub ESP, 4 mov [ESP], variable_name
Note that the name variable_name is a custom variable name and could also be a register (i.e. EAX or the instruction pointer, which resides in IP, EIP or RIP, depending on your architecture). The brackets around [ESP] indicate that the data at the address of ESP is accessed, instead of the address itself. The stack pointer is decreased with 4 because the return address is 32-bits in size, or 4 bytes.
Similarly, the pop “macro” is the reverse of the push “macro”. This example can be found below. In this example, the EAX register is used to store the result of the pop “macro”.
mov EAX, [ESP] add ESP, 4
First, the value at the address of the stack pointer is copied into the accumulating register EAX. After that, the address of the stack pointer is raised with 4, due to the fact that the saved value was a double word, which equals 32-bits or 4 bytes.
Macros that are defined by the programmer, work the same as the two provided examples. The layout together with an example are provided below. The example pops three values from the stack whilst saving them in the registers EAX, EBX and ECX respectively. Note that value that was last added to the stack is found in EAX.
%macro name number_of_arguments [instructions] %endmacro %macro value_popper 0 pop eax pop ebx pop ecx %endmacro
Calling a macro is the same as calling any other instruction after it has been defined. This can be seen in the example below.
push var_a push var_b push var_c value_popper
Stack
The stack grows, on x86 and x86_64, downwards. In some other architectures the stack direction can be selected by the programmer (i.e. in ARM), but the ARM Thumb-2 instruction set has set instructions for a stack that grows downwards, which makes a downwards growing stack favourable but not mandatory.
The stack can be compared to a pile of books. From this pile, one can take books or one can add books. Yet, this visualises a stack that grows upwards. Another way to visualise the stack, is the example of a school desk. Unfortunately, students always stick their old tasteless pieces of chewing gum under the table. The amount of chewing gum grows towards the floor (downwards), yet it still grows.
In both examples the order in which books or pieces of chewing gum are added or removed, are the same: Last In, First Out (LIFO). The push variable_name instruction adds a variable to the stack, whereas the pop destination_address removes the last added item from the stack and stores it at the provided address.
Calling conventions
The stack is used to save variables for later usage. Since the variables are always stored in the same manner (LIFO), it begs the question if a function call with 3 parameters should look like the first example or the second example. There are a lot of calling conventions out there and nearly all of them differ at least slightly. There is no point in learning all the calling conventions by heart, but knowing they exist comes in handy during the analysis of binaries: one needs to be able to recognise the difference and understand the differences. This paragraph is therefore not dedicated to explain which calling convention uses what structure, but instead focuses on the differences in architecture.
[Example 1] push var_a push var_b push var_c call func_a [Example 2] push var_c push var_b push var_a call func_a
Neither of these two are wrong, since different calling conventions prescribe different standards. Not only the order in which variables are stored and read from the stack, also the clean-up after a function has been called differs. Below are two examples, the first being a caller clean-up and the second being a callee clean-up. In the caller clean-up, the stack is not cleaned within the function but after, whereas the callee clean-up, cleans the stack on the return instruction.
[Caller clean-up] push EBP mov EBP, ESP sub ESP, total_value [function content] mov ESP, EBP pop EBP ret 0 [Callee clean-up] push EBP mov EBP, ESP sub ESP, total_value [function content] mov ESP, EBP pop EBP ret total_value
Note that the variable total_value equals the total size of all local variables in the function. A local variable is a variable that is declared within a function and cannot be used outside said function. An example in C is given below.
void function() { int a = 1; int b = 2; } int main(int argc, char** argv) { function(); return 0; }
The total size of the local variables in the method called function on a 32-bit binary, would equal two times the size of an integer. The size of an integer is 4 bytes, making the total amount 8.
Saving registers
A register can only contain a single value at any given moment. To preserve the values in registers, the stack is used. Caller saved registers are registers (such as EAX, ECX and EDX) should be saved by the caller if the value is of importance. There is no guarantee that the values will be the same after the called function returns.
The callee saved registers should be the same when the called function returns, as it is the responsibility of the called function (the callee) to save these values on the stack and restore them before returning.
Stack frames
A stack frame generally starts off with the same instructions, which can be found below.
push EBP mov EBP, ESP sub ESP, total_variable_size
Note that the variable total_variable_size is a placeholder in this example.
The base pointer (BP, EBP or RBP), equals the base for the previous function and should be preserved, which is done with the push instruction. After the value of EBP is saved, the register can freely be used. The stack pointer (SP, ESP or RBP) is then stored in the base pointer, so it points to the top of the stack (relative to this function). The address which is resides in the stack pointer is then subtracted (since the stack grows downwards) by the total size of all local variables.
Leaving the frame is the opposite of the instructions that are used to create a stack frame, as can be seen in the example below.
mov ESP, EBP pop EBP ret
The base pointer of the current function points to the current place in the stack, and is thus saved in the stack pointer. The old value of the base pointer was saved in the beginning, and is now popped off the stack into the base pointer register. After that, the function returns where all of the registers that should be preserved, are preserved. Documentation of a function provides information about the way the return value is provided, although this generally is the accumulating register (AX, EAX or RAX). If the return value is bigger than the size of the register (i.e. more than 32-bits on a 32-bit architecture), the return value can be a pointer to a data structure or multiple registers can be used. This depends on the calling convention that is used and the size, since two 32-bit registers can store a maximum value of 64-bit. If the return value is even more than the combined size of the additional registers, a data structure has to be used.
The next article regarding the “Practical case: Buffer Overflow 0x01” can be found here!.
To contact me, you can e-mail me at [info][at][maxkersten][dot][nl], or DM me on Twitter @Libranalysis.