MalPull

Obtaining malware samples to analyse can be quite difficult at times, depending on the type of access that one has. Several (free) online malware sample databases exist, but visiting them one by one to check if a specific sample is present, is a tedious and time consuming task. MalPull is created to automate the search on multiple platforms, and download samples from the sample database that contains the sample. The program’s source code and precompiled Java Archive and can be found on GitHub. The latest release is also available on GitHub.

Table of contents

Features

MalPull uses the APIs of MalShare, Malware Bazaar, Koodous, and VirusTotal to search for a sample based on a given MD-5, SHA-1, or SHA-256 hash. Both MalShare and Koodous require an API key that can be obtained by creating a free account. The API key of VirusTotal is only usable if one has a paid account. Note that not all services have to be used when using MalPull.

To optimally use the query limits one has with these services, MalPull queries free services without a API request limit first. After that, free services with an API request limit are queried, with paid services at last. The order, if all services are in use, is given below.

  1. Malware Bazaar
  2. MalShare
  3. Koodous
  4. VirusTotal

Installation

To run the program, one needs a recent version of the Java Runtime Environment. It has been tested with OpenJRE 8, but the code is not dependent on a specific Java version. MalPull requires no further installation, as the dependencies are embedded within the JAR. The required command-line arguments provide MalPull with the API keys, hashes, and the location to save all downloads to.

Compilation

The compilation for this project is done using Maven. To compile the Java code with its dependencies, one can use the command that is given below. Note that the current working directory needs to be in the MalPull folder for the exact command to work.

mvn clean compile assembly:single

After the compilation, the compiled JAR is placed inside the target folder.

Usage

To use MalPull, one has to provide four command line arguments. The first argument is the amount of threads that can be used by MalPull when downloading samples. The minimum is one, and the maximum is left up to the user. Using more threads than you have (virtual) cores is unlikely to give an advantage due to the way threads are scheduled.

The second argument is the location to a file that contains API keys for all services, of which an example is given below.

virustotal=abcd1234
malshare=abcd1234
koodous=abcd1234
malwarebazaar=enabled

The order of the services in this file does not matter. If you wish to not use any of the services, simply remove them from this file. Note that Malware Bazaar does not have an API key, meaning that any value can be used. Malware Bazaar is represented this way to offer the user the option to include or exclude the service.

The third argument is the file location of a file that contains the hashes. Each hash needs to be separated by a new line. The hashes are deduplicated by MalPull, meaning duplicate entries are only downloaded once in total.

The fourth argument is the folder to store all downloaded samples in. The file name of each sample is equal to the file’s hash, as given in the list of hashes. If a file with the same name in the given location already exists, it is overwritten without warning. If the output folder does not exist, it is created, including any of the missing parent directories.

An example on how to run MallPull is given below.

java -jar /path/to/MallPull.jar 6 ~/Downloads/malpull_test/keys.txt ~/Downloads/malpull_test/hashes.txt ~/Downloads/malpull_test/output/

Hashes that cannot be found on any of the services, are printed once all hashes have been iterated through.


Planned updates

Below, a list of planned updates is given in no apparent order:

  • Optionally save the sample in a password protected ZIP archive
  • Add more malware database repositories (such as URLHaus, Hybrid-Analysis, or AnyRun)

Change log

In the list below, all changes are kept together with the release date of the given version.

1.1-stable [15th of July 2020]

List of features

  • Added VirusTotal support.
  • Added multi-threading support to download multiple samples at the same time. The maximum thread count is configurable as a command-line setting.
  • The input now requires a file that contains all hashes that are to be downloaded, separated by a newline. The command-line requires an argument that specifies the location of the input file.
  • The API keys are stored in a separate file, allowing for a more efficient use of the command-line arguments.
  • If a hash cannot be found on any of the enabled services, it is added to a list of missing hashes. This list is printed once all samples have been downloaded.
  • The total time spent for the downloading of all samples is given once all samples have been downloaded
  • Duplicate entries in the download list are filtered prior to downloading, thus avoiding double API queries that would impact the query limit of any of the used services.
  • The output folder is given via the command-line interface, where each file is written to. File names of samples are based upon the hash in the list of samples that are to be downloaded. Existing files will be overwritten without warning.

1.0-stable [6th of April 2020]

List of features


To contact me, you can e-mail me at [info][at][maxkersten][dot][nl], send me a PM on Reddit or DM me on Twitter @LibraAnalysis.