Obtaining malware samples to analyse can be quite difficult at times, depending on the type of access that one has. Several (free) online malware sample databases exist, but visiting them one by one to check if a specific sample is present, is a tedious and time consuming task. MalPull is created to automate the search on multiple platforms, and download samples from the sample database that contains the sample. The program’s source code and precompiled Java Archive can be found on GitHub. The latest release is also available on GitHub.
Table of contents
MalPull uses the APIs of MalShare, Malware Bazaar, Koodous, VirusTotal, Triage, and VirusShare to search for a sample based on a given MD-5, SHA-1, or SHA-256 hash. MalShare, Koodous, Triage, and VirusShare require an API key each, which can be obtained by creating a free account. The API keys of both VirusTotal and Koodous are only usable if one has a paid license. Note that not all services have to be used when using MalPull.
To optimally use the query limits one has with these services, MalPull queries free services without a API request limit first. After that, free services with an API request limit are queried, with paid services at last. The order, if all services are in use, is given below.
- Malware Bazaar
To run the program, one needs a recent version of the Java Runtime Environment. It has been tested with OpenJRE 8, but the code is not dependent on a specific Java version. MalPull requires no further installation, as the dependencies are embedded within the JAR. The required command-line arguments provide MalPull with the API keys, hashes, and the location to save all downloads to.
The compilation for this project is done using Maven. To compile the Java code with its dependencies, one can use the command that is given below. Note that the current working directory needs to be in the MalPull folder for the exact command to work.
mvn clean compile assembly:single
After the compilation, the compiled JAR is placed inside the target folder.
The connections towards Triage, MalShare, and Malware Bazaar are done via their Java API client libraries, which I have published over time. Note that all of these libraries are completely open-source. Make sure to install these, as explained on the Maven website. The excerpt below, taken from the linked Maven website, shows how to install the libraries on your system, which needs to be done prior to building MalPull!
mvn install:install-file -Dfile=<path-to-file> -DpomFile=<path-to-pomfile>
Not doing this will lead to build failure errors!
To use MalPull one has to provide two or more arguments to the command-line interface. The first argument is the (relative) location to download the requested samples. The second, and any subsequent, argument should be a hash of a sample that is to be downloaded.
A file named keys.txt with all API keys for the to-be used services needs to be in the same folder as MalPull’s JAR. This file also contains a value called hashes, which requires a valid integer, to specify the amount of simultaneous threads that are to be run. Using more threads than available (virtual) cores is unlikely to give an advantage as the thread scheduling is inefficient in those situations. Below, the keys.txt lay-out is given.
threads=N virustotal=abcd1234 malshare=abcd1234 koodous=abcd1234 malwarebazaar=enabled triage=abcd1234 virusshare=abcd1234
The order of the services in this file does not matter. If you wish to not use one or more of the available services, simply remove them from this file. Note that Malware Bazaar does not have an API key, meaning that any value can be used. Malware Bazaar is represented this way to offer the user the option to include or exclude the service.
An example on how to run MallPull is given below.
java -jar /path/to/MallPull.jar /sample/destination/ hash1 hash2 hashN
Hashes that cannot be found on any of the services, are printed once all hashes have been iterated through.
The modularisation of MalPull
Since version 1.3-stable, MalPull consists of two main modules, both of which are located within the same project. The main module, located at malpull.MalPull, is the multi-threaded downloader, which downloads the files for the given hashes. The other module is the command-line interface, which is located at malpull.cli. This module handles the interpretation of the given command-line arguments, and the file handling for the API keys and hashes files.
One can use the main module in any project, be it to create a new command-line or graphical user interface, or to use the downloader functionality within a project as an object. To do so, create an instance of malpull.MalPull, which contains the logic for the downloader, and can log its output to any given PrintWriter. To easily use the standard output, one can use System.out as the writer, or one can create a custom object to log the output to a different stream or file.
Below, a list of planned updates is given in no apparent order:
- Optionally save the sample in a password protected ZIP archive
- Add more malware database repositories (such as URLHaus, Hybrid-Analysis, or AnyRun)
In the list below, all changes are kept together with the release date of the given version.
List of features
- Updated the CLI, as explained in the linked patch notes.
- Added VirusShare as a source to download from.
- Updated the VirusTotal API from version 2 to version 3.
- Replaced Triage, MalShare, and Malware Bazaar code with their respective API client, as linked in the patch notes.
- Minor changes, such as the improved error handling of blank lines of text in the keys.txt file, improved documentation, and the printing of the to-be used platforms during runtime.
List of features
- Changed the project’s internals into a modular structure. The multi-threaded downloader module is now separate from the command-line interface. As such, one can easily use MalPull with a different interface (be it command-line based or a graphical user interface), or it can be used within a different project. More information can be found in the linked patch notes.
List of features
- Moved Triage to be queried as first service, rather than being last.
- Added Triage support.
- Minor fixes (removed spelling mistakes in the JavaDoc, added missing JavaDoc entries, fixed the incorrect download count)
List of features
- Added VirusTotal support.
- Added multi-threading support to download multiple samples at the same time. The maximum thread count is configurable as a command-line setting.
- The input now requires a file that contains all hashes that are to be downloaded, separated by a newline. The command-line requires an argument that specifies the location of the input file.
- The API keys are stored in a separate file, allowing for a more efficient use of the command-line arguments.
- If a hash cannot be found on any of the enabled services, it is added to a list of missing hashes. This list is printed once all samples have been downloaded.
- The total time spent for the downloading of all samples is given once all samples have been downloaded
- Duplicate entries in the download list are filtered prior to downloading, thus avoiding double API queries that would impact the query limit of any of the used services.
- The output folder is given via the command-line interface, where each file is written to. File names of samples are based upon the hash in the list of samples that are to be downloaded. Existing files will be overwritten without warning.