Assembly language consists of a set of instructions (commonly referred to as operation codes or “opcodes”) in combination with memory addresses. There are many instruction sets, and one can even write his/her own instruction set. The most commonly known ones are the 32-bit (x86) and 64-bit (x64_86) sets, the ARM sets and the MIPS sets. The first two are used on the desktop and laptop platforms that are used every day. The ARM sets are used on mobile phones, whereas the MIPS sets are often used in embedded devices.
The focus of this course starts with 32-bit (x86) and 64-bit (x86_64) architecture. After the completion of this course, additional architectures will be added.
The registers on the CPU are equal in size to the architecture: 32 bits in x86 and 64 bits in x86_64. Originally, the first Central Processing Units (CPU) could contain only 8 bits (such as the Intel 8008, released in 1972). Later, the 16-bit architecture emerged (e.g. the Intel 8086, released in 1974). The Intel 8086 was the first CPU of the “x86 family”. Notable processors in this family, that followed after the Intel 8086, were the Intel 80386 (32-bit architecture, also known as i386, released in 1985) and AMD’s Opteron (64-bit architecture, released in 2003). The reason that these CPUs are considered a family is because of the backwards compatibility with regards to previous versions. This is the reason that a 32-bit binary can be executed on a 64-bit platform.
General Purpose Registers (GPR)
In the x86 family, the registers have roughly stayed the same, although some registers have been added later on. The 16-bit Intel 8086 used 8 General Purpose Registers:
The Accumulator Register (AX): used in arithmetic and is often used to store the return value (if the value does not exceed the register size)
The Base Register (BX): contains a pointer to data (the DS register is used when in segmented mode)
The Counter Register (CX): used during loops (to keep track of the loop count) and shifts/rotations of data
The Data Register (DX): used for I/O and arithmetic
The Stack Pointer Register (SP): points to the top of the stack
The Stack Base Pointer Register (BP): used to point to the base of the stack
The Source Index Register (SI): used during stream operations and points to the source location
The Destination Index Register (DI): used during stream operations and points to the destination location
The Instruction Pointer Register (IP): used to point to the next instruction, also known as the program counter (PC)
The Accumulator Register (AX), Base Register (BX), Counter Register (CX) and Data Register (DX) are divided in two 8-bit registers. The lower half is accessible by replacing the X with the L (for Lower) and the higher half is accessible by replacing the X with an H (for Higher). This results in:
AX = AH and AL BX = BH and BL CX = CH and CL DX = DH and DL
The 32-bit architecture uses the same registers as the 16-bit variant, but each register has twice the size. The naming scheme is therefore different: each register has the prefix E, which stands for Extended:
EAX, EBX, ECX, EDX, ESP, EBP, ESI, EDI and EIP
The 64-bit architecture uses the same registers as the 32-bit variant and has registers twice the size of the 32-bit architecture. The 64-bit registers have the prefix R (which stands for Register), resulting in a different naming scheme:
RAX, RBX, RCX, RDX, RSP, RBP, RSI, RDI and RIP
Other than the different naming scheme, the 64-bit architecture brought something new: 8 additional registers:
R8, R9, R10, R11, R12, R13, R14 and R15
The first 8 registers are counted from 0 through 7, resulting in the total amount of 16 registers. Some of these registers are used for specific purposes, whereas others are free to use by the user. According to documentation provided by Intel, the values in the registers R8, R9, R10 and R11 are considered volatile and should be considered lost when another function is called. The values in the registers R12, R13, R14 and R15 must be saved before calling another function.
An additional difference with the 32-bit variant is the way variables are passed to another function. In the 8-bit, 16-bit and 32-bit variants, the stack is used to pass arguments to a function. In the 64-bit variant, the first few arguments are stored in registers (RCX, RDX, R8 and R9), whereas the remaining arguments are pushed on the stack. The difference exists because there are different calling conventions, which will be explained in chapter two’s article Methods and macros: the call stack.
Other than the General Purpose Registers, there are six Segment Registers. Their purpose is to store the value of the segments of the binary which is executed. Nowadays, these registers are not always used for their originally intended purpose since the memory is not accessed flat but via paging. This topic will be discussed in a later chapter but the registers and their original purpose are included here for the sake of completeness.
The Code Segment Register (CS): contains the value of the code segment of the binary.
The Data Segment Register (DS): contains the value of the data segment of the binary.
The Extra Segment Registers (ES, FS and GS): the extra registers are filled with data from the operating system such as exceptions or thread handling
The Stack Segment Register (SS): contains the value of the stack segment of the binary.
In the story of Guilliver’s Travels, the Lilliputans discuss whether to break eggs on the big end(ian) or the small end(ian). Danny Cohen used this analogy in his “On holy wars and a plea for peace” to solve the conflict between the different methods of reading and storing values in memory.
When the Big Endian notation is used, the big end of the data is read first (which equals the lowest address in memory of the value). This is the most logical way of writing for humans, as we do it all the time. The four characters a, b, c and d can also be written as a hexadecimal value. The characters are, respectively, 0x61, 0x62, 0x63 and 0x64. If these four characters would be written as a single hexadecimal value with the Big Endian notation, the output would be as follows: 0x61626364 (which equals “abcd”). The most significant byte (or the big end) is placed at the lowest address.
The Little Endian notation is the reverse of the Big Endian notation, with the least significant byte (or the little end) at the lowest address. The same string “abcd” in Little Endian would be written as “dcba”, which equals 0x64636261 as a hexadecimal value.
The flags register contains information about numerous specific settings. Each setting is saved in another bit of the register. As a result, all these settings are saved in a single register, and yet have specific names. The flags in the flags register are set by default after some instructions such as sub, add or cmp (compare).
Below, all flags that are stored in the flags register are explained, starting with the 16-bit architecture. Note that these flags are used in the x86 family as a whole.
The Carry Flag (CF): this flag is set when an addition needs to carry one bit over (e.g. 9+7 makes 6 with the carry set to one) or during subtraction when a bit is borrowed (e.g. 1-2 will set the carry to one).
The Parity Flag (PF): if the sum of the set bits of a value is even, this flag is set. Otherwise it is not set.
The Adjust Flag (AF): this flag is also known as the Auxiliary Flag (AF) or the Auxiliary Carry (AC). This register functions as the Carry Flag between the lowest 4-bits (1 nibble) and the highest 4-bits (1 nibble) of an 8 bit register.
The Zero Flag (ZF): if the result of an instruction is zero, the Zero Flag is set to 1, otherwise it is set to 0.
The Sign Flag (SF): checks if a value is signed, meaning that the most significant bit equals 1. If this is true, the flag is set to 1, otherwise it is equal to 0. A signed value is a negative value.
The Trap Flag (TF): sets the CPU in single step mode. This mode executes only a single instruction at a time before it halts. During debugging, this flag is set.
The Interrupt Enable Flag (IF): if external interrupts are allowed, the flag is set to 1. If external interrupts should be ignored, the flag is set to 0.
The Direction Flag (DF): if the value of the flag is 0, data is read from the left side onwards. If the value equals 1, data is read from the right side onwards.
The Overflow Flag (OF): if a signed value does not fit in the register without losing the signing bit, this register is set to 1 in order to avoid the loss of the sign bit. Otherwise, this register will remain 0.
The I/O Privilege Level Flag (IOPL): this flag is two bits in size, making it possible to contain higher values. This is required for the 4 privilege levels (0 through 3). If the privilege level of a program should be equal to or less than the value of this flag. Otherwise, the requested action will be denied. This flag can only be altered from the kernel itself (ring 0).
The 32-bit architecture has twice as much space as the 16-bit architecture, resulting in additional flags.
The Resume Flag (RF): used during debugging and debugging exceptions
The Virtual 8086 Mode (VM): if the CPU runs in 8086 compatibility mode, this flag is set to 1. Otherwise it is 0.
The Alignment Check (AC): used during the alignment checking of memory addresses, where a 1 means enabled and a 0 means disabled.
The Virtual Interrupt Flag (VIF): the virtual version of the Interrupt Flag (IF).
The Virtual Interrupt Pending (VIP): set to 1 if a virtual interrupt is pending, otherwise it is equal to 0.
The CPUID Flag (ID): depending on the value, different results are returned from the CPUID call.
The VAD Flag (VAD): allows the Virtual Address Descriptor to be accessed if set to 1, otherwise it is set to 0.
Similar to the shift from 16-bit to 32-bit architecture, the 64-bit architecture has twice the size of the 32-bit architecture. The registers in the upper half of the 64-bit flags register (which equals the newly added space, compared to the 32-bit flags register) are all reserved and are therefore not accessible.
Assembly language differs from most of the programming languages we know today, especially the code that is generated (and optimised) by the compiler. The easiest way to compare assembly language, is with the help of another language (any given one works). In this case, C is used as an example, since this provides the option to directly access memory. This helps to explain the usage of registers in practice. In this practical case, an integer with the value 5 is printed.
Example in ASM (x86)
Firstly, the stack point is subtracted with 8 bytes, as two 4 byte variables will be pushed on the stack. Since the stack grows downwards, the value which is to be printed is pushed onto the stack before the literal string (which is located at 0x0484c0 and equals “%d”). The function “sym.imp.printf” is the default “printf” function which prints data to the “stdout”. Due to the found symbols (sym), Radare2 calls it an imported (imp) function with the name “printf”. The function “printf” is then called and uses the provided parameters. The return value of “printf” equals the amount of characters that are written to the “stdout”, which would be 1 in this case. This value is saved in the accumulating register EAX.
0x0804841c sub esp, 8 0x0804841f push 5 ; 5 0x08048421 push 0x80484c0 0x08048426 call sym.imp.printf
Example in C
If one is to call the function “printf” with 1 variable, it would look like this in C:
The function “printf” requires a literal string in which the flags are provided. The flag “%d” represents an unsigned decimal integer. The first value after the literal string equals the value of the first flag that has been provided.
Using (pseudo) C, one can visualise the structure used in assembly language. The literal string “%d” is located at the memory address 0x80484d0. The value of the integer is located at the stack pointer (ESP) minus the size of 1 variable (4 bytes). Note that the call frame is left to avoid overcomplication. This matter is explained in detail in a later chapter.
printf((char *)0x80484d0, (int *)[ESP - 4]);
The chapter regarding assembly basics can be found here!