This article was published on the 7th of May 2019. This article was updated on the 30th of April 2020, as well as on the 11th of May 2020.
The Universal Product Code (UPC) might not ring a bell, but the term barcode most likely does. Each barcode is represented using 12 digits, out of which each digit is also displayed as a stripe which is partially black and partially white. The used UPC variant in this article, UPC-A, is chosen as one can calculate the last digit based upon all previous ones.
In this article, the creation of the calculation method using x86 assembly for Linux distributions will be described in great detail. Similar to the Hello World article, NASM will be used to assemble the code and the GNU linker will be used to link the object file into an x86 ELF file.
Table of contents
- The UPC-A calculation method
- Compiling the code
- The .data section
- The .text section
- The complete code
The UPC-A calculation method
The idea to write this program, comes from /r/dailyprogrammer. In short, several steps are required to calculate the twelfth digit of the code, when the first eleven are known.
- Calculate the sum of the digits at odd indexes
- Multiply the sum of the digits at indexes by three
- Add the sum of the digits at even indexes to this total
- Divide the total sum by 10 and see if the remainder equals zero
- If the remainder does not equal zero, subtract the remainder from ten
- The final result is equal to either zero or to 10 minus the remainder
Note that the final result can never be bigger than a single digit, this is important later on.
Compiling the code
To compile the assembly code with NASM and the GNU linker, one can use the script below.
nasm -f elf32 ./program.asm -o program.o ld -m elf_i386 ./program.o -o ./program ./program
The .data section
Before starting, it is useful to think of all the data that needs to be declared. The db directive is used to declare the 11 digit UPC-A number. The semicolon marks a comment, similar to // in C or Java.
Additionally, the length of the number is declared as well. In this example, the hard coded length of 0x0b (11 decimal) would suffice, but it is a best practice to avoid hard coding values into any program. Below, the assembly code for this section is given.
section .data upc: db '03600029145' ;another number is 04210000526 upcLength: equ $-upc
The .text section
The .text section contains the executable code for the program. Within the program, labels are used to indicate which part of the code is responsible for what. In the end, where the complete code is given, each instruction is annotated with a comment.
Even and odd indexes
The terms even and odd will cause some confusion within this program. The reason for this is the location of the first digit: index zero. As such, the second digit is stored at index one, and so on. This means that every even index contains a digit that is located at an odd index in the UPC. This could be avoided within the code, but it has not been altered because it forces the reader to understand what is going on.
String to integer conversion
Since a digit is only one character in size, a simple technique can be used to convert the value from a string into an integer. Note that the types are written in cursive because there are no types in assembly language. Data is interpreted in one way or another, depending on the expected input.
The value of the selected digit is not the the literal digit that was declared in the .data section. It actually is the hexadecimal representation of the human readable characters. The number 0 is equal to 0x30, as can be seen in the ASCII table (see man ascii or use this website). Note that input validation is out-of-scope for this program.
As such, the assumption can be made that any digit that is selected, ranges between 0 and 9. In hexadecimal notation, the range is between 0x30 and 0x39. To obtain the value of the digit (between 0 and 9), one needs to subtract 0x30 from each digit before adding it to the total.
The odd indexes
To get all numbers at odd indexes, meaning the numbers at even indexes when going through the digits, a loop is required. In a lot of languages, a for-loop is defined using an integer to keep track of the count (often named i), the array to loop through (with a given length) and the amount with which the count should be altered. An example (in C) is given below.
int i; int myList[100]; int size = sizeof(myList) / sizeof(int); for( i = 0; i < size; i = i + 1 ){ //execute this code each iteration }
Note that the size of the array is calculated by the size of the array as a whole, divided by the size of each element.
Using the example above in assembly language, the check if i exceeds the amount of elements that the array contains, should be done directly after setting the variables that are required (both i and myList). This way, an unconditional jump can be made after the code that should be executed during each iteration, is executed.
At first, the variables should be set and a label should be created. This label is used to jump back to at the end of the loop. The code is given below.
section .text global _start _start: mov ecx, 0 mov eax, 0 getOddIndex: cmp ecx, upcLength jge processOddIndex ;the code to execute each loop is placed here add ecx, 2 jmp getOddIndex processOddIndex: ;the next step starts here
The register ecx is used to store the count in, similar to the integer i in the example above. The register eax is used to store the sum of all iterated integers.
The count is then compared to the length of the given Universal Product Code. In the next instruction, a jump is made to the processOddIndex label if the ecx is greater then (or equal to) the length of the Universal Product Code. Then the code that is executed during each iteration, is executed. To advance to the next odd indexed number (even index count within the program), the count should be incremented with two. Lastly, an unconditional jump is taken to the start of the loop. If no more iterations are required, the compare instruction will cause the conditional jump to end the loop and continue the execution afterwards.
With the loop’s skeleton in place, one can focus on the code that is required for this loop. A single byte is required, which is provided in a hexadecimal format. This should be reduced to the value it has when it is human readable. This value should be added to the total.
The lowest 8 bits of ebx are stored in a register named bl, where the l stands for lower. This register will be used to store the current digit in, before it gets added to the total (which resides within eax).
To move a value into this register, one can use the mov instruction. One needs to specify the target, the size to move from the source and the source itself. In this case, the source is located at the address of upc (declared in the .data section).
Note that the compiler often deduces the size of the source operand that needs to be moved, but a mistaken can be made here. Using the correct size will eradicate such a mistake.
However, simply stating mov bl, upc would be incorrect. This would try to move the address of upc into the bl register. Using brackets, one can access the value that is located at an address (i.e. a pointer in C).
Currently, there is one thing still wrong: each iteration of the loop will now access the same digit. The offset within the digit (for which ecx is used) needs to be added to the address of upc. The assembly instruction is given below.
mov bl, byte [upc+ecx]
After this, ebx contains the digit in hexadecimal format. To reduce the value with 0x30, the sub instruction is used. Lastly, the digit that resides in ebx needs to be added to the total, which is located in eax. The instructions are given below.
sub ebx, 0x30 add eax, ebx
Processing the odd indexes
The sum of the values that reside at the odd indexes need to be multiplied by three. After that, the value should be stored somewhere in order to free a register. As such, the imul instruction can be used. It multiplies a signed integer by a given value and stores the result in the first operand.
A safe place to store a variable is the stack, thus the push instruction is used to store the eax register on the stack. The code is given below.
processOddIndex: imul eax, 3 push eax
The even indexes
To loop through the even indexes, a new loop has to be created. After storing the previously calcualted odd index value total on the stack, the registers eax and ecx can be used again. In this loop, their function has not changed. First, the variables need to be set. This time, the count is set to start at 1, instead of 0. This way, the loop iterates through the odd indexes (meaning the even indexes in the provided 11 digits).
prepareEvenIndex: mov ecx, 1 mov eax, 0
The actual loop itself does not differ from the previous loop. The two loops could be combined, but this would increase the complexity of the code, which is something that was avoided in this article. The code is given below.
getEvenIndex: cmp ecx, upcLength jge processEvenIndex mov bl, byte [upc+ecx] sub ebx, 0x30 add eax, ebx add ecx, 2 jmp getEvenIndex
Recalculating the total
The sum of odd indexes (multiplied by three) is stored on the stack whilst the register eax currently holds the value of the sum of the even indexes. To add these two together, the value from the stack needs to be retrieved (using the pop instruction) and the two values should be added together (using the add instruction).
processEvenIndex: pop ebx add eax, ebx
Calculating the modulo
To divide the total by 10, the div instruction is used. The value to divide with is stored in two registers: edx and eax. Since the required value to divide with, is equal to 0x0a, only a single register is required. As such, edx can be set to zero and the ecx register is used to store the value 0x0a. The remainder of the division is stored in edx. The code is given below.
calculateModulo: xor edx, edx ;eax is already set mov ecx, dword 0xa div ecx
Note that xor edx, edx is equal to mov edx, 0 as the result of a bitwise exclusive OR
with the the same values is always zero.
Comparing the modulo
First, one needs to check if the remainder is equal to zero. If this is true, the conditional jump to the next step will be taken. If it is false, the execution continues.
At first, 0x0a (or 10 in decimal notation) is stored in eax. Afterwards, the value that resides within edx (the remainder of the division) is subtracted from eax. Lastly, the value should be stored within edx. The code is given below.
compareModule: cmp edx, 0 je addThirtyHex mov eax, 0x0a sub eax, edx mov edx, eax addThirtyHex: ;the next step starts here
Integer to string conversion
Similar to subtracting the base value of 0x30 during the sum calculation, the opposite should be done for the calculated digit. If this part of the code is omitted, a non human readable character is printed. Adding 0x30 only works if the returned value is a single digit, but in this case, that assumption is always true.
Additionally, the string should contain a new line character. To achieve this, the newline character 0x0a should be appended to the digit. The digit resides in the lowest 8 bits of the edx register: dl. The 8 bits after that are accessible via the register dh. Writing the newline character into this register appends the newline to the digit when printing it.
Lastly, the contents of the edx register should be saved, as a pointer towards the value is required during the writing. The value is stored on the stack using the push instruction.
addThirtyHex: add edx, 0x30 mov dh, 0x0a push edx
Print the result
To print data, one requires the sys_write system call, which equals number 4. The system call number is stored in the eax register. The selected output is the standard output (stdout), which is selected usign the value 1 in the ebx register. The data to print is located at the stack pointer (esp), as it was pushed prior to this instruction.
The amount of characters (two in this case) that should be printed, is stored in the edx register. Raising the interrupt 80h enables the interrupt handler (in this case the kernel) to perform the requested action.
printValue: mov eax, 4 mov ebx, 1 mov ecx, esp mov edx, 2 int 80h
A clean shutdown
From this point onwards, none of the registers need to be preserved and the only action that remains, is the shutdown of the program. The system call number to exit is 1 and the exit code for a clean exit equals 0. Raising interrupt 80h, the interrupt is handled by its handler: the kernel. As such, the requested system call is executed and the program exits cleanly.
shutdown: mov eax,1 mov ebx,0 int 80h
The complete code
The complete code for the program is given here.
;author: Max 'Libra' Kersten ;Twitter contact information: @Libranalysis section .data upc: db '03600029145' ;the first 11 digits of the UPC-A, of which the 12th digit will be calculated ;another number is 04210000526 upcLength: equ $-upc ;the length of the given number section .text global _start _start: mov ecx, 0 ;set the counter to 0, as it should loop through the odd numbers (even indexes) mov eax, 0 ;set the accumulating register to 0, as it will contain the sum of the digits getOddIndex: cmp ecx, upcLength ;compare the count value to the length of the UPC-A jge processOddIndex ;jump if ecx is greater or equal than the length of the given UPC-A mov bl, byte [upc+ecx] ;move the current character into the lower 8 bits of ebx: bl sub ebx, 0x30 ;convert the "character" into an "integer" with the same value (0x30 equals 0 in the ascii table) add eax, ebx ;add the digit to the total, which is stored in eax add ecx, 2 ;increment the counter to get the next odd index jmp getOddIndex ;jump back to check if another iteration is required processOddIndex: imul eax, 3 ;multiply the total by three and store the result in eax push eax ;save the odd total on the stack prepareEvenIndex: mov ecx, 1 ;set the counter to 1, as it should iterate the even numbers (odd indexes) mov eax, 0 ;set the total for the even indices to 0 getEvenIndex: cmp ecx, upcLength ;compare the count value to the length of the UPC-A jge processEvenIndex ;jump if ecx is greater or equal than the length of the given UPC-A mov bl, byte [upc+ecx] ;move the current character into the lower 8 bits of ebx: bl sub ebx, 0x30 ;convert the "character" into an "integer" with the same value (0x30 equals 0 in the ascii table) add eax, ebx ;add the digit to the total, which is stored in eax add ecx, 2 ;increment the counter to get the next odd index jmp getEvenIndex ;jump back to check if another iteration is required processEvenIndex: pop ebx ;restore the odd number value in ebx add eax, ebx ;add the odd number value and the even number value together in eax calculateModulo: xor edx, edx ;zero edx since dx:ax is used to divide ;eax is already set mov ecx, dword 0xa ;set the value to divide with div ecx ;divide ecx by 10, the remainder is stored in edx compareModule: cmp edx, 0 ;compare the remainder to zero je addThirtyHex ;if the remainder is zero, jump forward in the code to the "integer" conversion mov eax, 0x0a ;moves 0x0a (10d) into eax sub eax, edx ;subtracts the remainder from 10d mov edx, eax ;stores the result of the subtraction in edx addThirtyHex: add edx, 0x30 ;add 0x30 to convert the "integer" to a "character" mov dh, 0x0a ;adds a newline to second byte of eax push edx ;stores the future output on the stack printValue: mov eax, 4 ;system call 4 is used to write data mov ebx, 1 ;the selected output channel, where 1 corresponds with the standard output (stdout) mov ecx, esp ;gets a pointer to the future output that is stored on the stack mov edx, 2 ;sets the amount of bytes that should be read from the given source in ecx int 80h ;raises the interrupt which the kernel handles shutdown: mov eax,1 ;system call 1 is used to exit the program mov ebx,0 ;the value 0 indicates that the program shut down without errors int 80h ;raises the interrupt which the kernel handles
To contact me, you can e-mail me at [info][at][maxkersten][dot][nl], send me a PM on Reddit, or DM me on Twitter @Libranalysis.