This article was published on the 27th of February 2023.
Dynamically resolving functions is a known obfuscation technique. Hashing the function name makes the analysis harder, as the import is not visible statically, nor in the strings of the sample. In this article, more information regarding API hashing is given, along with scripts to automate the process with the help of OALabs‘ HashDB service, or a local instance if so desired.
Table of contents
Imports and exports
The goal of a library is to export knowledge in reusable manner, both in the physical and virtual world. Dynamic Link Library (DLL) files serve this exact purpose: they contain building blocks for applications. Libraries contain methods, some of which are exposed to be called by other programs, and others which are to be used internally. The externally callable functions are called exported functions, and need to be marked as such. The snippet below (taken from Microsoft’s documentation) shows how to declare an external function using C linkage (thereby avoiding name mangling).
extern "C" char GetChar(void) { char ch; ch = getchar(); return ch; }
Note that variables can also be exposed externally, but that is out-of-scope for this article.
Applications which use exported functions from a library need to import it, allowing the library to be used during runtime (and loaded if it is not already). The Windows API functions are exported functions which can be imported by applications, allowing programmers to utilise the plethora of functions that the Windows operating system provides.
Seeing specific imported functions in malware, might indicate what the sample is capable of. A prime example is the usage of cryptography related imports by ransomware. Dynamically resolving the imports is one way to avoid showing the imports, but it generally comes with the downside of having too few imports to look benign.
Unless encrypted, the strings would show the name of the imports that are to be resolved during runtime. Hashing the API names is a common way for malware authors to hide the dynamically resolved import(s).
Hashing
Hashing algorithms exist in multiple different flavours. The most well known types are cryptographic and fuzzy hashes. The purpose of a hashing algorithm is to create a unique output for a given input, where the output cannot be traced back to the given input. The cryptographic hash of a file changes when a single byte is changed, which can be very little or completely. Fuzzy hashes split files in chunks, which are then hashed, and can thus be used to compare files which are similar. Adding a single byte at the end of a file changes the cryptographic hash, whereas the fuzzy hash will only change at the end.
To illustrate the differences, the hash of this blog’s slogan, “Security through explanation”, is given below in multiple formats. Additionally, the same slogan with an appended exclamation mark is hashed using the same algorithms, which clearly shows the difference between the cryptographic and fuzzy hashes, which are SHA-256 and SSDEEP respectively.
Input: Security through explanation SHA-256: 9de361151fe4c4293d0fe648e4acb9b7d346a8b97d317b18873d8669e967d6ff SSDEEP: HcewdFAZEwRLb:HcjAZEw9 Input: Security through explanation! SHA-256: a05d57ef8dbedf3215f2ce5412a163c506d53f24f17a9df678219432443f018d SSDEEP: HcewdFAZEwRLan:HcjAZEwo
As shown in the example, the cryptographic hash changed completely, whereas the fuzzy hash only changed a single character at the end, which is also where the change is located.
In API hashing, cryptographic hashes are commonly used. In some cases, home-brew algorithms are used and have to be recreated in full by the analyst. In other cases, known algorithms are used, which can often be found by looking for the magic values that are used. Searching in search engines for these values will commonly yield results of open-source implementations or explanations of said algorithms.
API hashing
The next problem at hand is the input of the algorithm, as the algorithm’s output is only seen in the code. Making a list of Windows API exports and comparing the output of the results with the hash solves this problem, assuming the used name is within the data set. Mandiant’s shellcode_hashes provide a script to generate such a dictionary yourself.
In practice, API hashing often retrieves a pointer to the function, after which it is called. The retrieval of the function pointer can be stored in a global variable, which would only require a single function call (per hashed function) to set it during runtime. Alternatively, it can be obtained each time it is required, after which it is either stored in a single global variable (which is overwritten every time), or in a local variable, where it is used until the current function returns and the local variable is unused.
A pseudo code example is given below.
int* pVirtualAlloc = findFunctionByHash(0x12345678); int* buf = (pVirtualAlloc)(NULL, 0x10000, MEM_COMMIT | MEM_RESERVE, PAGE_EXECUTE_READWRITE);
API hashing example: PlugX
The sample in this blog is PlugX Talisman, which hashes the API names using the CRC32 algorithm. Noteworthy here is the fact that the hashed function name contains the ending null byte, as also mentioned by Dr. Web.
The hash function below, written in Python3, returns the CRC32 hash of a given string with a null byte at the end.
import zlib def hash(data): data = data + b'\x00' return 0xffffffff & zlib.crc32(data)
Using a list of Windows API function names, the hashes can be compared. An easier effort for this, is OALabs‘ HashDB, as discussed in the next section.
HashDB
The HashDB service, also explained thoroughly by OALabs in this video, aims to make it easier to look for matching hashes in a database. The database is filled with hashes based on known algorithms, with the help of an extensive Windows API name list. The service allows the community to add algorithms via the respective GitHub service.
To utilise the service, one can query the API, or use the Ghidra or IDA Pro plug-ins.
Integration of this algorithm into the service requires a few minor changes to the code, as specific test values are required for HashDB. The complete script is given below.
#!/usr/bin/env python import zlib DESCRIPTION = "The CRC32 hash of the given string with the terminating null byte added to it" TYPE = 'unsigned_int' TEST_1 = 3133185683 def hash(data): data = data + b'\x00' return 0xffffffff & zlib.crc32(data)
The description should briefly describe the algorithm, whereas the type should indicate what value type is to be returned. The test variable should contain the hash, using the to-be-implemented algorithm, of ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789. This ensures the script is working correctly.
Conclusion
To conclude, API hashing makes the analysis harder, but still doable. This can be done manually, or with (a custom instance of) OALabs’ HashDB service. Understanding the used concepts will help to recognise the technique. The implementation of the API hashing might vary, but the concept remains the same.
To contact me, you can e-mail me at [info][at][maxkersten][dot][nl], or DM me on BlueSky @maxkersten.nl.