General techniques

Some techniques that are used, are platform independent, meaning that their usage is widespread and not restricted to a specific platform. In this article, numerous core concepts will be discussed. These concepts are divided into two categories: data related and bot related. At the start of each category, a general explanation will be given.

Data related techniques

Data can be compared to what oil is for a motor. It is essential, keeps everything running smooth and needs to be provided to the motor. The fact that malware contains data is a given, but how can one easily identify the spotted data? Below, multiple examples are given for different techniques that are all related to data.

Encoding

Data within the binary are often encoded to avoid static detection. This way, data can be stored within a variable that normally cannot be stored like that. A good example is the Base64 encoding scheme. One can encode an image into a string, which can then be decoded during runtime. An example in the wild of the usage of Base64 encoding strings, can be found in the Magecart analysis of this course.

Encodings like these are either standardised or home brew. The latter is a reference to moon shining, in which one brews his/her own alcoholic beverage. Similar to moon shining, the quality of these self made encodings is often less but they do not rely on standard libraries to encode or decode data for them. Due to the lack of these standard libraries, it can be harder to find the data: a cross reference of a function in a library cannot be made if there is no such library.

Encoded content can be anything, ranging from a string to a complete file of any type. Examples of data that is often encoded, are the Command & Control (C&C) server address that is embedded within the malware, the targeted file types in ransomware, the ransom note in ransomware or the embedded commands within a Remote Access Trojan (RAT).

Steganography

In some cases, the malware might contain or load a legitimate file, which also contains data that the malware requires. The Backswap malware uses an altered image to store the payload in. This bypasses the antivirus checks, as it looks like an image. When loading the image, nothing malicious happens. Only when the image is loaded by the malware, the required data can be extracted using the used algorithm. This algorithm can be home brew or it can be a known standard.

An example of such a standard is the Least Significant Bit (LSB) steganography, where data is stored in the least significant bit of a pixel. This reduces the image quality only a bit, whilst quite some data can be stored in a medium sized photo. An example of this, is the Python package stego-lsb, which supports multiple methods of LSB steganography.

Compression

A password protected compressed archive, like zip, rar or 7z, can stay under the antivirus’ radar since the archive cannot be opened without the password. After the extraction, the antivirus gets a second chance, unless the archive does not touch the disk and solemnly resides in memory, after which the extracted data can be used by the malware. An example of this, is the configuration file for a malware sample.

When extracting data from the target (the private key of the ransomware or a bunch of classified files, for example), it is effective to upload as little as possible. By compressing the data, the size decreases, reducing the upload speed from the target. Additionally, it is easier to upload multiple files as an archive, instead of uploading them separately. An example in the wild is the BALDR stealer, which compresses the files before they are exfiltrated.

Encryption

The aforementioned password protected archives are some form of encryption, but there are many more methods. Similar to other examples, encryption can either be home brew or it can be an industry standard algorithm. Do note that not every industry standard algorithm is safe.

Data Encryption Standard (DES) is an example of an unsafe algorithm. DES was standardised in 1975. A year after its standardisation, it was possible to brute force the algorithm by using a specially crafted device. Although the price of the device was high (roughly $20 million USD), it would not refrain state actors from obtaining such a device.

Another algorithm that is used by ransomware, is RSA. The public and private key are generated when the ransomware starts, after which the private key is submitted to the Command & Control server. The private key is then deleted and all encountered files are then encrypted with the public key, making decryption nearly impossible unless the private key can be recovered from memory or the Command & Control server. The ShiOne ransomware is an example of malware that uses the RSA encryption.

Other encryption methods that are used, are rotN and XOR. The rotN algorithm stands for rotation N, where N is a number between 1 and 25 (inclusive). With rot1, the A becomes a B, the B becomes a C, and so forth. The XOR, which stands for the bitewise eXclusive OR is always used with a given value. Neither of these algorithms are secure, but they work without the usage of any standard libraries and can be randomised for each generation of the malware sample. An example in the wild, is the Zeus trojan, which uses the XOR operator to obfuscate its strings.

Bot related techniques

Although one could argue that everything is data, the difference here lies in the used scope. Whereas the data related techniques are all scoped within a single binary, the scope of these techniques are wider: its focused on a single infection, rather than a single binary. To fully understand what the malware is doing, one should have an overview of both scopes.

Modular architecture

Malware rarely is a single executable that infects the machine from start to finish. From the start of the infection until the malware has completed its task(s), many files are used, created, altered, and deleted. Upon receiving a specific task (or as a default behaviour), an additional module may be downloaded by the malware. This module contains the code for the requested task(s).

Often, the module is deleted after it is used, making the analysis harder, especially if the file remains in memory, rather than temporarily on the disk. Automated sandboxes often save all files that are created by the submitted sample, but do not store files that are only resident in the memory.

The system’s API calls are required to make this technique work. More specifically, the creation, alteration and deletion of files on the system, as well as the creation of a process, or loading a dynamic link library (or shared object). As such, keeping an eye out for the usage of these calls within the code, might indicate that such an architecture is being used by the malware.

Emotet is known to use modules to perform tasks, as can be seen in this analysis.

Setting the stage

If no additional modules are downloaded, the final payload is often hidden in multiple stages. Note that this does not have to be the case, as it is up to the malware author how the sample works. When a phishing attempt is successful with a Visual Basic for Applications (VBA) macro in the Word document, it often executes an embedded Powershell script. This script could then call a Visual Basic Script (VBS), which then calls a Batch file, which executes the final payload: a PE file. An example of this was analysed earlier in this course: the Emotet dropper analysis.

Aside from delaying the analysis of the malware, this technique also serves to evade the antivirus software that is likely installed on the endpoint. The emulation engine of the antivirus suite can not handle all system calls and will not actually write files to the disk: the emulated binary will receive fake handles in order to continue the execution without errors.

The fake handles are mainly used for performance reasons: it would cost too much RAM and too much time to execute every binary that is executed on a machine inside a fully functional virtual machine. This would slow the user’s computer down to a grinding halt, rendering the machine useless. Additionally, the antivirus engine might not be able to handle a certain file type, although modern antivirus suites should be able to handle nearly all known file formats.

Connecting the dots

Within the malware sample, there generally is a central point which handles all commands. A perfect example of this, is a Remote Access Trojan (RAT). All the embedded commands are usually listed in an if-else structure. This class (or function) generally links to a lot of other classes (or functions), but is only linked to by very few. Each command is executed in another class (or function), which makes refactoring code very useful in this case. The complete capabilities of a bot can be rewritten based on this if-else structure.

Finding this class or function can be hard, as it doesn’t need to link to the system’s API. Additionally, the commands can be transformed using one of the aforementioned data techniques.

An example of this is hard to find online because most blogs only show the outcome of the investigation. The if-else structure isn’t part of the outcome, as its in the path of the outcome. Below, an example in pseudo code is given to given an impression of what it looks like.

public void handleArgument(String argument) {
    if (argument.equals("execute")) {
        new ExecuteHandler(argument);
    } else if (argument.equals("screenshot")) {
        new ScreenshotHandler();
    } else if (argument.equals("self-destruct")) {
        new SelfDestructor();
    } else if (argument.equals("upload-file")) {
        new FileUploader();
    } else if (argument.equals("open-rdp")) {
        new RdpHandler();
    }
}

In this example, the string argument is compared with set values. A match results in the instantiation of a new class, through which the command is executed. The handling is often done in a separate thread to be able to send multiple commands after each other, and to maintain contact with the bot.
The class in which this function resides, will have references to all classes that are used, whilst the used class will not have a reference back.

The dictionary

In some malware, the variables that are used within the bot, are grouped into a single class or struct. This makes it easier to change reoccurring values of variables, as they are all managed in one place. It also means that one only has to deobfuscate the values ones, after which the variables can be refactored into a meaningful name, which is then visible throughout the whole bot.

This class will have very little references to others (often none) whilst being referred to by a lot of classes. An example can be found in the leaked code of the Android banking trojan named GMBot.


To contact me, you can e-mail me at [info][at][maxkersten][dot][nl], send me a PM on Reddit or DM me on Twitter @LibraAnalysis.