Backtracking MageCart infections

MageCart groups have been roaming around for a while, infecting web shops left and right. Researchers have looked into the technical aspects of the skimmers, as have I. The recent COVID-19 pandemic opened up the playing field even further, which both criminals and researcher saw. Malwarebytes’ Jérôme Segura reported that there was a 26% increase in skimming in March 2020, compared to February 2020. RiskIQ’s Yonathan Klijnsma saw a rough 20% increase in skimming during the same time span.

Uncovering where skimmers are planted, for how long they have been active, and where the stolen credentials are exfiltrated towards, is vital to combat these skimmers and help the victims. This immediately brings up the ambiguity of the term, as there are two types of victims. The shop owner is the victim as the shop got infected, but the customer is also the victim, as credit card credentials were defrauded.

This article covers the victimised web shops in-depth. At first, the modus operandi is briefly discussed, after which the attribution (and lack thereof) is discussed. More information about the research methodology is then given and the results are set apart. At last, the conclusion is presented and the raw results are listed in full.

Table of contents

Modus operandi

MageCart skimmers are often hosted on websites via a compromised third party JavaScript library, or by a linked JavaScript library that only contains the skimmer. In the latter case, the domain is owned by the criminal actor. The data within the payment form, which includes credit card data, is copied and sent to a server that is controlled by the attacker just before the payment page unloads in the browser.

Neither the customer nor the web shop notices the skimming, as the transaction is completed, the product is delivered, and the web shop owner receives the money. It is only later that unknown credit card payments will be made. The victimised customer has no idea where, how, and when the card details were stolen, making it hard to find out what exactly happened.

Data set

The complete data set can be found at the bottom of this page, and contains 1236 affected web shops. The credit card skimming domains were partially obtained through own research, and some were taken from other research.

I have always acted based upon a single rule in these cases: the finding is attributed to the first to publish about it publicly. Private findings can only be verified once the data becomes publicly available. As I took quite some time when researching this topic, most of the data I gathered over time was published by others. The goal of my research is, however, slightly different. My research serves as an addition to the chronologically listed sources below.

If you feel like something is missing from this list related to this research’s data set, please get in touch with me.

Attribution

To attribute malware or campaigns to a specific group, one needs to have a high level of confidence in numerous factors, as a faulty attribution can play into the attacker’s advantage. The data set that I analysed for this blog is linked together in several ways, but it is not enough to claim that a specific group was behind this. Below, comments about the links (and lack thereof) are given, leaving the final conclusion to the reader.

Code similarity

Technically speaking, most skimmers are rather simple. Grabbing data is done by getting all forms or all input fields, getting the current domain, encoding the data, and exfiltrating the data to the attacker’s server. Due to the simplicity, it is harder to attribute certain attacks to a specific group.

The scripts that were analysed in this data set, mainly used two types of obfuscation. The first one is described in the blog about the MageCart skimmer on the Olympic ticket reseller sites, and the second one uses Obfuscator.io, which is a known public JavaScript obfuscator. This makes it harder to determine certain characteristics in the scripts, although some were available. It is also possible to buy such a script on the underground markets, meaning that code similarity doesn’t mean that such a script is linking two infections to the same actor.

Infection chain

Another way to attribute infections to a specific group, is to look at the infection chain that the affected sites are part of. Once an actor infects a site with the skimmer, it starts to generate profit. When skimmer’s domain is taken down, the source of income stops. Logically, the actor buys another domain to host the skimmer on, and alters the link on the web shop. This change is also visible when researching cases, especially when backtracking. These skimmer domain changes, or jumps, expose the actor’s campaign.

However, there is a caveat that needs to be taken into account. When excluding third party related infections, nearly all affected web shops are small or medium enterprises. These companies likely failed to update their web shop in time. An automatic script that automatically injects the skimmer in vulnerable web shops is, sadly, a feasible option for attackers. This means that other actors can also get access, thereby clouding the attribution. If the infection is done automatically, there will be more skimmers present, as its an addition, rather than a replacement. Within the data set of this research, this did not occur often.

Alleged attribution

The domains that Jacob Pimental and I originally started out with, were linked to MageCart 12. Quite some of the uncovered domains are also present in RiskIQ’s list of MageCart 12 related domains. As such, there are multiple sources that point towards a specific group. The accuracy of the attribution is lessened the longer ago it is. Based on that, I can attribute at least a big part of this research to MageCart 12.

Research methodology

The results of this research are based on the outcome of the data that is present on UrlScan. Starting off with the skimmer domain that Jacob Pimental and I wrote about, one can search for the moment that the skimmer domain switched in the infection chain. Repeating this process results in a list of all the exfiltration domains in the chain until it either breaks or the search is stopped. Additionally, one can recursively query every affected domain to search for other skimmer domains. This addition is considered out of scope for this research.

Collecting all affected domains

Using a privately built scanner to parse and store the results from UrlScan’s API, together with a set of rules to detect malicious skimmer scripts, the chain of skimmer sites can be built. Using this list, one can find all the affected domains. At last, all affected domains need to be verified. Some scan results redirect to legitimate websites, as they only work when accessed using specific parameters.

To prepare the list for the next step, one has to remove incorrect and double entries. An incorrect entry can link to an IP address, which cannot be used in the next step’s automatic notification program. The automatic scanner removed double entries already, but there are some edge cases that are hard to define automatically. A particular edge case is the usage of sub domains. In some cases, these serve as duplicate yet not unique entries, as can be seen below.

subdomain.company.tld
company.tld

In other cases, the subdomain is the unique identifier of the company on a third platform, as can be seen below.

company.thirdparty.tld

Removing the first type of subdomain usage but leaving the others within the data set, will result in a list with all affected web shops.

Informing affected companies

To contact a company, one needs the domain together with the first and last detection time stamp of a skimmer on said domain. Contacting can be done via e-mail, or via the contact form on the website. Due to the amount of affected web shops, manual work is out of the question. As such, I wrote an program that automatically e-mails five e-mail addresses of the given domain, with a message in it. The five e-mail addresses are given below.

info@domain.tld
contact@domain.tld
sales@domain.tld
abuse@domain.tld
security@domain.tld

Even though my intentions were good, the GMail address that I used got temporarily suspended, as the messages were considered spam by the automatic detection. Given that the body was the same for each e-mail, aside from the domain and the two detection time stamps, this was expected. Alas, only 200 out of the 1236 companies could be contacted this way.

Results

The results are divided into three parts. At first, the availability of the domains is discussed. Secondly, insight in the branches of the web shops is given. At last, the geographic location of the head quarters of the web shops is shown on a world map.

Availability

The availability of the affected web shops is measured by checking if the site was still operational. Using the Way Back Machine, Google, and DuckDuckGo the branch and country could be retrieved for some of the unreachable sites. From the 1236 web shops in total, 70% was reachable.

A lot of the web shops were not (fully) set-up, as many of the about us pages contained Lorem Ipsum text. These web shops generally did contain products that were for sale, although not all.

Note that not all infections within the data set loaded the actual skimmer, as the skimmer domain could have been either unreachable or taken down. This is favourable for the shopping customer, but the infection on the web shop was still present, as the request was recorded.

Branches

The affected web shops sell a wide variety of services and goods. The web shops have been classified into five categories, which are given below in order of magnitude.

  • Products
  • Unknown
  • Food
  • Services
  • Adult entertainment

The products category contains all shops that sell products in one form or another. A product can be used as a resource to create another product, or it can refer to the final product in the supply chain. The unknown category is given to web shops that were not accessible, nor found in other sources. The food category contains, as its name implies, food related shops. The services category contains any web shop that offers services.

The adult entertainment section is sometimes categorised as a product, and sometimes as a service, which is one reason why it is listed seperately. The main reason to list it separately in the results, is the fact that the skimmer takes all data that is entered on the website, including the domain name. With the personally identifiable information, individuals can be blackmailed. The rules in some countries related to this branch are rather strict. As such, a victim can get in more trouble if these details were to be sold or leak, aside from the credit card fraud.

The pie chart below shows the division of the branches of all 1236 web shops.

Geographic location

To see what continent or region is struck the most within the given data set, one can plot the location of each web shop on a map. Matching can be done based upon top level domain, but this would skew the results, as a lot of European and Asian web shops use the .com top level domain. By collecting the location as it was defined on the website of the web shop, the details are accurate. The results for all countries with 15 or more affected web shops are given below, in descending order. The infection count is given within brackets.

  1. US (303)
  2. Unknown (280)
  3. IN (79)
  4. UK (68)
  5. DE (50)
  6. AU (47)
  7. BR (46)
  8. FR (34)
  9. IT (31)
  10. NL (28)
  11. CA (23)
  12. ES (19)

When plotting all collected data on a map, it becomes clear that the United States of America were hit the hardest. This is logical, as the credit card is a standard means of payment, whilst this is not the case in Europe. The map is given below.

When looking at the data set over time, there becomes two trends visible. The first one relates to the country of the affected web shops. Generally, the same country is struck multiple times, after which it is either not touched for a bit, or not at all anymore. The second trend is similar to the first one, but then related to the branch, rather than the country.

This does not necessarily mean that the actor(s) follow this pattern, as the data that is provided as input to UrlScan might follow this pattern, based upon the submissions it receives.

Conclusion

It is difficult to attribute the skimmer infections to a specific group, given that the skimmers are quite generic, and easily obtainable. The trends in the data show possibly interesting approaches, assuming that the input data is not skewed.

If you have shopped at any of the shops that are in the list below between the given dates, your credit card credentials are likely to be compromised. Please request a new credit card and contact your bank accordingly. Also note that all information that was entered on the site’s payment form was stolen by the credit card skimmer, and should be considered compromised.


To contact me, you can e-mail me at [info][at][maxkersten][dot][nl], send me a PM on Reddit or DM me on Twitter @LibraAnalysis.


Raw results