Dismantling Common Internet Services for Ad-Malware Detection
Florian Nettersheim, Stephan Arlt, Michael Rademacher
TL;DR
The paper investigates who defines ad-malware on the web by comparing threat labeling across common Internet services, including filtered DNS endpoints and VirusTotal. It extends the Kattikatti crawler framework with a Threat Intel Broker and an Ad-Malware Detector to label HTTP requests from ad-related traffic, enabling automated cross-service analysis. Results reveal substantial inconsistencies: DNS providers label only a small fraction of domains as malicious, VirusTotal flags a larger portion but with significant partner disagreement, and only a tiny share of flagged domains are actually ad-malware. The work highlights the lack of a shared definition for ad-malware and argues for standardized labeling approaches (e.g., Maat) and more nuanced detection methods to improve web safety and transparency for users and researchers alike.
Abstract
Online advertising represents a main instrument for publishers to fund content on the World Wide Web. Unfortunately, a significant number of online advertisements often accommodates potentially malicious content, such as cryptojacking hidden in web banners - even on reputable websites. In order to protect Internet users from such online threats, the thorough detection of ad-malware campaigns plays a crucial role for a safe Web. Today, common Internet services like VirusTotal can label suspicious content based on feedback from contributors and from the entire Web community. However, it is open to which extent ad-malware is actually taken into account and whether the results of these services are consistent. In this pre-study, we evaluate who defines ad-malware on the Internet. In a first step, we crawl a vast set of websites and fetch all HTTP requests (particularly to online advertisements) within these websites. Then we query these requests both against popular filtered DNS providers and VirusTotal. The idea is to validate, how much content is labeled as a potential threat. The results show that up to 0.47% of the domains found during crawling are labeled as suspicious by DNS providers and up to 8.8% by VirusTotal. Moreover, only about 0.7% to 3.2% of these domains are categorized as ad-malware. The overall responses from the used Internet services paint a divergent picture: All considered services have different understandings to the definition of suspicious content. Thus, we outline potential research efforts to the automated detection of ad-malware. We further bring up the open question of a common definition of ad-malware to the Web community.
