Analysing scripts

This article was published on the 14th of January 2021.

Malware often uses scripts to perform the first actions in the infection chain. A common example is an Office attachment that contains a malicious macro, which executes some PowerShell code to download the next stage. The malicious macro, which is usually written in Visual Basic for Applications or VBA in short, is a script, as is the mentioned PowerShell code.

At first, the definition of a script is given. After that, the importance of knowing how to analyse said scripts is discussed, as well as a way of working when analysing these scripts. The latter part includes language specific tips for common languages.

Table of contents

Defining terminology

For the ease of use, all non-compiled programs are referred to as scripts in this blog. This crude definition is obviously not applicable to all languages, but this article is not focused on providing an exact definition. Rather, the focus is on the practical handling of non-compiled code.

A note is to be made about scripts that can make extensive use of compiled code. In here, one can think of Python or PowerShell. Python can use native libraries, or be converted in a standalone binary that contains the Python interpreter. PowerShell can use the Dot Net Framework’s codebase to perform certain actions. The analysis of such a script is still based upon the non-compiled code that is visible to the analyst, which is why its handled in this article within this course.

The importance of analysing scripts

As stated in the introduction, scripts are often used early on in the attack chain, although they can also be found within the later stages to persist the malware on the system. The early stages are especially interesting from a Blue Team’s perspective, as the earliest opportunity to block malicious software is the best one. Obviously, this is not to say that other options to also block the malware should be ignored.

Scripts are often looked at for a short amount of time, whereas a later stage payload might take up much more time. The time spent on a single instance of a payload, be it a script or executable, is not the only thing to take into account, as one also has to take note of the frequency. There are usually multiple scripts that lead to the same (or a very similar) executable. As such, its a good still to become proficient in quickly analysing scripts.

Way of working

Before diving into language specific tips, some more general tips related to scripts will be covered. It’s important to always have easy access to an interpreter of the language you are analysing. Not all interpreters can handle all code. Using JavaScript as an example, the browser’s embedded JavaScript interpreter will handle some code different when compared to Microsoft’s wscript or NodeJS.

Make small proof-of-concepts to ensure your assumptions work. A mistake is easily made, and working off faulty assumptions can, and will, come back to haunt you.

Needless to say, life is easier when using syntax highlighting in your text editor. One can use any text editor that works, ranging from a console based application such as vim or nano, to a fully fledged editor such as Notepad++ (Notepadqq on Linux distributions) or Visual Studio Code.

When refactoring code, you can always use the search and replace functionality. Note that this function often supports regular expressions! As these search and replace actions are often executed in a sequence, its messy to keep pressing undo (or CTRL+Z) until the script is back to the way it was. Instead, its easier to open a new tab, editor instance, or file where a temporary copy is stored before attempting to refactor the code. Also take into account that string literals might contain whatever value you aim to replace, thereby breaking the script. Its clarifying to highlight all occurrences of a specific term before replacing it.

Being on the target platform can often help when dealing with more platform specific code. Some platform independent code can behave differently on one platform than it does on another. Different behaviour is not necessarily common, which makes it harder to spot. As such, it is easy to diving into an unknown rabbit hole, whilst analysing the script is the actual goal.

Keep the goal of your analysis in mind. If you want to study a specific sample through and through, then you can dive into every rabbit hole you encounter. If you are, however, part of the incident response team, then you goal will be to find out what the script does in a general outline, if it can be removed, and how to look for it on other machines. For optimal results, it is wise to adjust your research methodology according to your goal.

PowerShell

As stated before, this language makes use of the Dot Net framework. This provides an attacker with the tools to run (nearly) any code they want, be it via several methods. The interoperability classes within the Dot Net framework allow a programmer to use native functions. As such, a program that is written in C# can execute shellcode by allocating a buffer, setting the memory to executable, and transferring the control to the shellcode’s memory segment. What logically follows, is the fact that a PowerShell script can do the same, as it can utilise the same classes and logic. This has the disadvantage that one can end up with a rather complex script, when a simple one is expected. The advantage here is that the used types and classes need to be defined one way or another. The types and classes are defined as strings when casting a variable to a specific type, or when initialising a variable. As such, its rather helpful to decrypt the strings.

Aside from the wide array of options one has when writing PowerShell, one must also note that the script interpreter is not case sensitive when interpreting types, variables, or functions. Additionally, backticks can be present in many places throughout the script, as this does not impact the code’s execution. It does, however, make life a bit harder when trying to search and replace. The names myF`unction and my`Fu`nction are the same for the interpreter, but a search and replace command will not replace both, unless the backticks are removed before issuing the command, or when one uses a regular expression when replacing the term. The latter is more work than simply removing all backticks, but make sure that there are no backticks in the script that are actually required.

Another example here is the fact that variables can be written as a string: (${“myVar”}). The string that equals the variable name is not case sensitive, and can also contain backticks.

One can replace iex, which stands for Invoke-Expression, with Write-Host. This prints the code that would otherwise be executed to the standard output. Note that simply replacing these calls will not make a script safe to execute by default. Execute only the part that already looked at, as the code can contain other methods of running code or programs. When used correctly and effectively, this can help to avoid manually decoding or decrypting a given variable.

Obfuscation that is often seen, is Invoke-Obfuscation and ISE Steroids.

It is helpful to turn on PowerShell logging when using a Windows machine to analyse and run code on, as this makes it easy to check older snippets at a later stage.

JavaScript

As stated before, different interpreters can result in different results. This is especially the case when using the browser’s JavaScript interpreter, compared to wscript on Windows. In multiple cases, I encountered local variables that were accessible outside of their scope. An example is given below.

function foo() {
  var a = 3;
}
var b = a + 3;

The browser’s console will, rightfully, show the error: Uncaught ReferenceError: a is not defined. Loading the malware using wscript, however, states that variable b equals 6.

This specific behaviour can also be observed in a blog by the Malwarebytes Threat Intelligence team regarding a JavaScript loader that drops in Gootkit or REvil on the victim’s device.

When encountering code that uses eval to execute code, one can use console.log instead, as this prints the content to the console, instead of executing it. The remarks that have been made about this technique for the PowerShell language, are valid here as well.

Visual Basic for Applications

Office macros are written in this language, which is often the first code that one will encounter when chronologically following the trail of an attacker. There are quite some ways to hide the macro code, such as but not limited to VBA stomping or VBA purging. Macros are mostly made to run in Microsoft Office products, but OpenOffice or LibreOffice products are also able to execute macro code, although some specific functionality might not be included.

Using any type of office product to run small segments of the code allows the analyst to fully understand what is going on, without needing to execute the whole macro at once. Similar to other mentioned languages, one can use the MessageBox to quickly display strings.

Python

Most of the aforementioned tips are also applicable to Python, but there is a special note to make when analysing a Python script. Be aware of the version that is used, which can be both between the major version (either version 2 or 3), but also regarding minor versions.

PHP

Often used to conceal web shells or to drop a payload, PHP is often found on compromised and/or wrongly configured servers. Aside from installing PHP locally to run code, its important to know that the official documentation of PHP is superb. The documentation provides all you need to know about a certain function. Understanding the code becomes easier when the PHP functions are clarified with ease.

In the case of a web shell, a variable is usually decoded and/or decrypted multiple times in a row. This is done to avoid detection and easily create a new hash for the PHP file that is placed on the server. From an analysis point of view, this is rather repetitive. Automating this process (where possible) can save a lot of time in some cases.

Dropping malware is often done in similar fashion, but can include a bit more logic. One example is keeping track of who downloads the payload, as can be read in this analysis of Emotet’s dropper code.

Conclusion

In short, its worth the time you need to invest to become well versed in analysing scripts. Scripts come in numerous formats and sizes, but the essence of this article provides a guideline for nearly all scripting languages, including those which are not described here.


To contact me, you can e-mail me at [info][at][maxkersten][dot][nl], or DM me on Twitter @Libranalysis.