Universal Product Code calculator

This article was published on the 7th of May 2019. This article was updated on the 30th of April 2020, as well as on the 11th of May 2020.

The Universal Product Code (UPC) might not ring a bell, but the term barcode most likely does. Each barcode is represented using 12 digits, out of which each digit is also displayed as a stripe which is partially black and partially white. The used UPC variant in this article, UPC-A, is chosen as one can calculate the last digit based upon all previous ones.

In this article, the creation of the calculation method using x86 assembly for Linux distributions will be described in great detail. Similar to the Hello World article, NASM will be used to assemble the code and the GNU linker will be used to link the object file into an x86 ELF file.

Table of contents

The UPC-A calculation method

The idea to write this program, comes from /r/dailyprogrammer. In short, several steps are required to calculate the twelfth digit of the code, when the first eleven are known.

  1. Calculate the sum of the digits at odd indexes
  2. Multiply the sum of the digits at indexes by three
  3. Add the sum of the digits at even indexes to this total
  4. Divide the total sum by 10 and see if the remainder equals zero
  5. If the remainder does not equal zero, subtract the remainder from ten
  6. The final result is equal to either zero or to 10 minus the remainder

Note that the final result can never be bigger than a single digit, this is important later on.

Compiling the code

To compile the assembly code with NASM and the GNU linker, one can use the script below.

nasm -f elf32 ./program.asm -o program.o
ld -m elf_i386 ./program.o -o ./program

The .data section

Before starting, it is useful to think of all the data that needs to be declared. The db directive is used to declare the 11 digit UPC-A number. The semicolon marks a comment, similar to // in C or Java.

Additionally, the length of the number is declared as well. In this example, the hard coded length of 0x0b (11 decimal) would suffice, but it is a best practice to avoid hard coding values into any program. Below, the assembly code for this section is given.

section .data
	upc:        db  '03600029145' ;another number is 04210000526
	upcLength:  equ $-upc

The .text section

The .text section contains the executable code for the program. Within the program, labels are used to indicate which part of the code is responsible for what. In the end, where the complete code is given, each instruction is annotated with a comment.

Even and odd indexes

The terms even and odd will cause some confusion within this program. The reason for this is the location of the first digit: index zero. As such, the second digit is stored at index one, and so on. This means that every even index contains a digit that is located at an odd index in the UPC. This could be avoided within the code, but it has not been altered because it forces the reader to understand what is going on.

String to integer conversion

Since a digit is only one character in size, a simple technique can be used to convert the value from a string into an integer. Note that the types are written in cursive because there are no types in assembly language. Data is interpreted in one way or another, depending on the expected input.

The value of the selected digit is not the the literal digit that was declared in the .data section. It actually is the hexadecimal representation of the human readable characters. The number 0 is equal to 0x30, as can be seen in the ASCII table (see man ascii or use this website). Note that input validation is out-of-scope for this program.

As such, the assumption can be made that any digit that is selected, ranges between 0 and 9. In hexadecimal notation, the range is between 0x30 and 0x39. To obtain the value of the digit (between 0 and 9), one needs to subtract 0x30 from each digit before adding it to the total.

The odd indexes

To get all numbers at odd indexes, meaning the numbers at even indexes when going through the digits, a loop is required. In a lot of languages, a for-loop is defined using an integer to keep track of the count (often named i), the array to loop through (with a given length) and the amount with which the count should be altered. An example (in C) is given below.

   int i;
   int myList[100];
   int size = sizeof(myList) / sizeof(int);
   for( i = 0; i < size; i = i + 1 ){
      //execute this code each iteration

Note that the size of the array is calculated by the size of the array as a whole, divided by the size of each element.

Using the example above in assembly language, the check if i exceeds the amount of elements that the array contains, should be done directly after setting the variables that are required (both i and myList). This way, an unconditional jump can be made after the code that should be executed during each iteration, is executed.

At first, the variables should be set and a label should be created. This label is used to jump back to at the end of the loop. The code is given below.

section .text
	global _start
		mov ecx, 0
		mov eax, 0
		cmp ecx, upcLength
		jge processOddIndex
		;the code to execute each loop is placed here
		add ecx, 2
		jmp getOddIndex
		;the next step starts here

The register ecx is used to store the count in, similar to the integer i in the example above. The register eax is used to store the sum of all iterated integers.

The count is then compared to the length of the given Universal Product Code. In the next instruction, a jump is made to the processOddIndex label if the ecx is greater then (or equal to) the length of the Universal Product Code. Then the code that is executed during each iteration, is executed. To advance to the next odd indexed number (even index count within the program), the count should be incremented with two. Lastly, an unconditional jump is taken to the start of the loop. If no more iterations are required, the compare instruction will cause the conditional jump to end the loop and continue the execution afterwards.

With the loop’s skeleton in place, one can focus on the code that is required for this loop. A single byte is required, which is provided in a hexadecimal format. This should be reduced to the value it has when it is human readable. This value should be added to the total.

The lowest 8 bits of ebx are stored in a register named bl, where the l stands for lower. This register will be used to store the current digit in, before it gets added to the total (which resides within eax).

To move a value into this register, one can use the mov instruction. One needs to specify the target, the size to move from the source and the source itself. In this case, the source is located at the address of upc (declared in the .data section).
Note that the compiler often deduces the size of the source operand that needs to be moved, but a mistaken can be made here. Using the correct size will eradicate such a mistake.

However, simply stating mov bl, upc would be incorrect. This would try to move the address of upc into the bl register. Using brackets, one can access the value that is located at an address (i.e. a pointer in C).
Currently, there is one thing still wrong: each iteration of the loop will now access the same digit. The offset within the digit (for which ecx is used) needs to be added to the address of upc. The assembly instruction is given below.

		mov bl, byte [upc+ecx]

After this, ebx contains the digit in hexadecimal format. To reduce the value with 0x30, the sub instruction is used. Lastly, the digit that resides in ebx needs to be added to the total, which is located in eax. The instructions are given below.

		sub ebx, 0x30
		add eax, ebx

Processing the odd indexes

The sum of the values that reside at the odd indexes need to be multiplied by three. After that, the value should be stored somewhere in order to free a register. As such, the imul instruction can be used. It multiplies a signed integer by a given value and stores the result in the first operand.

A safe place to store a variable is the stack, thus the push instruction is used to store the eax register on the stack. The code is given below.

		imul eax, 3
		push eax

The even indexes

To loop through the even indexes, a new loop has to be created. After storing the previously calcualted odd index value total on the stack, the registers eax and ecx can be used again. In this loop, their function has not changed. First, the variables need to be set. This time, the count is set to start at 1, instead of 0. This way, the loop iterates through the odd indexes (meaning the even indexes in the provided 11 digits).

		mov ecx, 1
		mov eax, 0

The actual loop itself does not differ from the previous loop. The two loops could be combined, but this would increase the complexity of the code, which is something that was avoided in this article. The code is given below.

		cmp ecx,  upcLength
		jge processEvenIndex
		mov bl, byte [upc+ecx]
		sub ebx, 0x30
		add eax, ebx
		add ecx, 2
		jmp getEvenIndex

Recalculating the total

The sum of odd indexes (multiplied by three) is stored on the stack whilst the register eax currently holds the value of the sum of the even indexes. To add these two together, the value from the stack needs to be retrieved (using the pop instruction) and the two values should be added together (using the add instruction).

		pop ebx
		add eax, ebx

Calculating the modulo

To divide the total by 10, the div instruction is used. The value to divide with is stored in two registers: edx and eax. Since the required value to divide with, is equal to 0x0a, only a single register is required. As such, edx can be set to zero and the ecx register is used to store the value 0x0a. The remainder of the division is stored in edx. The code is given below.

		xor edx, edx
		;eax is already set
		mov ecx, dword 0xa
		div ecx

Note that xor edx, edx is equal to mov edx, 0 as the result of a bitwise exclusive OR
with the the same values is always zero.

Comparing the modulo

First, one needs to check if the remainder is equal to zero. If this is true, the conditional jump to the next step will be taken. If it is false, the execution continues.

At first, 0x0a (or 10 in decimal notation) is stored in eax. Afterwards, the value that resides within edx (the remainder of the division) is subtracted from eax. Lastly, the value should be stored within edx. The code is given below.

		cmp edx, 0
		je addThirtyHex
		mov eax, 0x0a
		sub eax, edx
		mov edx, eax
	        ;the next step starts here

Integer to string conversion

Similar to subtracting the base value of 0x30 during the sum calculation, the opposite should be done for the calculated digit. If this part of the code is omitted, a non human readable character is printed. Adding 0x30 only works if the returned value is a single digit, but in this case, that assumption is always true.

Additionally, the string should contain a new line character. To achieve this, the newline character 0x0a should be appended to the digit. The digit resides in the lowest 8 bits of the edx register: dl. The 8 bits after that are accessible via the register dh. Writing the newline character into this register appends the newline to the digit when printing it.

Lastly, the contents of the edx register should be saved, as a pointer towards the value is required during the writing. The value is stored on the stack using the push instruction.

		add edx, 0x30
		mov dh, 0x0a
		push edx

Print the result

To print data, one requires the sys_write system call, which equals number 4. The system call number is stored in the eax register. The selected output is the standard output (stdout), which is selected usign the value 1 in the ebx register. The data to print is located at the stack pointer (esp), as it was pushed prior to this instruction.

The amount of characters (two in this case) that should be printed, is stored in the edx register. Raising the interrupt 80h enables the interrupt handler (in this case the kernel) to perform the requested action.

		mov eax, 4
                mov ebx, 1
		mov ecx, esp
                mov edx, 2
                int 80h

A clean shutdown

From this point onwards, none of the registers need to be preserved and the only action that remains, is the shutdown of the program. The system call number to exit is 1 and the exit code for a clean exit equals 0. Raising interrupt 80h, the interrupt is handled by its handler: the kernel. As such, the requested system call is executed and the program exits cleanly.

		mov eax,1
		mov ebx,0
		int 80h

The complete code

The complete code for the program is given here.

;author:                                     Max 'Libra' Kersten
;Twitter contact information:                @Libranalysis
section .data
    upc:             db  '03600029145'      ;the first 11 digits of the UPC-A, of which the 12th digit will be calculated
                                            ;another number is 04210000526
    upcLength:  equ $-upc                   ;the length of the given number
section .text
    global _start
        mov ecx, 0                          ;set the counter to 0, as it should loop through the odd numbers (even indexes)
        mov eax, 0                          ;set the accumulating register to 0, as it will contain the sum of the digits
        cmp ecx, upcLength                  ;compare the count value to the length of the UPC-A
        jge processOddIndex                 ;jump if ecx is greater or equal than the length of the given UPC-A
        mov bl, byte [upc+ecx]              ;move the current character into the lower 8 bits of ebx: bl
        sub ebx, 0x30                       ;convert the "character" into an "integer" with the same value (0x30 equals 0 in the ascii table)
        add eax, ebx                        ;add the digit to the total, which is stored in eax
        add ecx, 2                          ;increment the counter to get the next odd index
        jmp getOddIndex                     ;jump back to check if another iteration is required
        imul eax, 3                         ;multiply the total by three and store the result in eax
        push eax                            ;save the odd total on the stack
        mov ecx, 1                          ;set the counter to 1, as it should iterate the even numbers (odd indexes)
        mov eax, 0                          ;set the total for the even indices to 0
        cmp ecx,  upcLength                 ;compare the count value to the length of the UPC-A
        jge processEvenIndex                ;jump if ecx is greater or equal than the length of the given UPC-A
        mov bl, byte [upc+ecx]              ;move the current character into the lower 8 bits of ebx: bl
        sub ebx, 0x30                       ;convert the "character" into an "integer" with the same value (0x30 equals 0 in the ascii table)
        add eax, ebx                        ;add the digit to the total, which is stored in eax
        add ecx, 2                          ;increment the counter to get the next odd index
        jmp getEvenIndex                    ;jump back to check if another iteration is required
        pop ebx                             ;restore the odd number value in ebx
        add eax, ebx                        ;add the odd number value and the even number value together in eax
        xor edx, edx                        ;zero edx since dx:ax is used to divide
                                            ;eax is already set
        mov ecx, dword 0xa                  ;set the value to divide with
        div ecx                             ;divide ecx by 10, the remainder is stored in edx
        cmp edx, 0                          ;compare the remainder to zero
        je addThirtyHex                     ;if the remainder is zero, jump forward in the code to the "integer" conversion
        mov eax, 0x0a                       ;moves 0x0a (10d) into eax
        sub eax, edx                        ;subtracts the remainder from 10d
        mov edx, eax                        ;stores the result of the subtraction in edx
        add edx, 0x30                       ;add 0x30 to convert the "integer" to a "character"
        mov dh, 0x0a                        ;adds a newline to second byte of eax
        push edx                            ;stores the future output on the stack
        mov eax, 4                          ;system call 4 is used to write data
        mov ebx, 1                          ;the selected output channel, where 1 corresponds with the standard output (stdout)
        mov ecx, esp                        ;gets a pointer to the future output that is stored on the stack
        mov edx, 2                          ;sets the amount of bytes that should be read from the given source in ecx
        int 80h                             ;raises the interrupt which the kernel handles
        mov eax,1                           ;system call 1 is used to exit the program
        mov ebx,0                           ;the value 0 indicates that the program shut down without errors
        int 80h                             ;raises the interrupt which the kernel handles

To contact me, you can e-mail me at [info][at][maxkersten][dot][nl], send me a PM on Reddit, or DM me on Twitter @Libranalysis.