Table of Contents
Fetching ...

MANTIS: Detection of Zero-Day Malicious Domains Leveraging Low Reputed Hosting Infrastructure

Fatih Deniz, Mohamed Nabeel, Ting Yu, Issa Khalil

TL;DR

MANTIS tackles the problem of detecting zero-day malicious domains by leveraging a content-agnostic, graph-based approach that exploits the reuse of hosting infrastructure. It constructs a heterogeneous PDNS-based graph around seed malicious domains and trains a semi-supervised GNN to produce daily blocklists, while enabling on-demand predictions through an inductive framework that aggregates embeddings from multiple GNN encoders with a meta-learner. The system delivers high-precision detection ($\approx 99.7\%$) and strong recall ($\approx 86.9\%$) at a very low false-positive rate ($0.1\%$), identifying about $19{,}000$ new malicious domains per day and often detecting them days to weeks before blocklists like VirusTotal or GSB. Post-analysis confirms robustness, explainability, and practical deployment viability, though limitations remain in detecting compromised domains and certain shared-hosting scenarios. Overall, MANTIS represents a scalable, proactive, and interpretable framework for early malicious-domain detection with potential for extension to additional signals such as registrations and TLS certificates.

Abstract

Internet miscreants increasingly utilize short-lived disposable domains to launch various attacks. Existing detection mechanisms are either too late to catch such malicious domains due to limited information and their short life spans or unable to catch them due to evasive techniques such as cloaking and captcha. In this work, we investigate the possibility of detecting malicious domains early in their life cycle using a content-agnostic approach. We observe that attackers often reuse or rotate hosting infrastructures to host multiple malicious domains due to increased utilization of automation and economies of scale. Thus, it gives defenders the opportunity to monitor such infrastructure to identify newly hosted malicious domains. However, such infrastructures are often shared hosting environments where benign domains are also hosted, which could result in a prohibitive number of false positives. Therefore, one needs innovative mechanisms to better distinguish malicious domains from benign ones even when they share hosting infrastructures. In this work, we build MANTIS, a highly accurate practical system that not only generates daily blocklists of malicious domains but also is able to predict malicious domains on-demand. We design a network graph based on the hosting infrastructure that is accurate and generalizable over time. Consistently, our models achieve a precision of 99.7%, a recall of 86.9% with a very low false positive rate (FPR) of 0.1% and on average detects 19K new malicious domains per day, which is over 5 times the new malicious domains flagged daily in VirusTotal. Further, MANTIS predicts malicious domains days to weeks before they appear in popular blocklists.

MANTIS: Detection of Zero-Day Malicious Domains Leveraging Low Reputed Hosting Infrastructure

TL;DR

MANTIS tackles the problem of detecting zero-day malicious domains by leveraging a content-agnostic, graph-based approach that exploits the reuse of hosting infrastructure. It constructs a heterogeneous PDNS-based graph around seed malicious domains and trains a semi-supervised GNN to produce daily blocklists, while enabling on-demand predictions through an inductive framework that aggregates embeddings from multiple GNN encoders with a meta-learner. The system delivers high-precision detection () and strong recall () at a very low false-positive rate (), identifying about new malicious domains per day and often detecting them days to weeks before blocklists like VirusTotal or GSB. Post-analysis confirms robustness, explainability, and practical deployment viability, though limitations remain in detecting compromised domains and certain shared-hosting scenarios. Overall, MANTIS represents a scalable, proactive, and interpretable framework for early malicious-domain detection with potential for extension to additional signals such as registrations and TLS certificates.

Abstract

Internet miscreants increasingly utilize short-lived disposable domains to launch various attacks. Existing detection mechanisms are either too late to catch such malicious domains due to limited information and their short life spans or unable to catch them due to evasive techniques such as cloaking and captcha. In this work, we investigate the possibility of detecting malicious domains early in their life cycle using a content-agnostic approach. We observe that attackers often reuse or rotate hosting infrastructures to host multiple malicious domains due to increased utilization of automation and economies of scale. Thus, it gives defenders the opportunity to monitor such infrastructure to identify newly hosted malicious domains. However, such infrastructures are often shared hosting environments where benign domains are also hosted, which could result in a prohibitive number of false positives. Therefore, one needs innovative mechanisms to better distinguish malicious domains from benign ones even when they share hosting infrastructures. In this work, we build MANTIS, a highly accurate practical system that not only generates daily blocklists of malicious domains but also is able to predict malicious domains on-demand. We design a network graph based on the hosting infrastructure that is accurate and generalizable over time. Consistently, our models achieve a precision of 99.7%, a recall of 86.9% with a very low false positive rate (FPR) of 0.1% and on average detects 19K new malicious domains per day, which is over 5 times the new malicious domains flagged daily in VirusTotal. Further, MANTIS predicts malicious domains days to weeks before they appear in popular blocklists.

Paper Structure

This paper contains 40 sections, 20 figures, 9 tables.

Figures (20)

  • Figure 1: MANTIS vs. Existing Approaches: MANTIS detects malicious domains much early at the hosting time compared to many of the existing techniques which often detect domains only after the web content is available.
  • Figure 2: Reuse of Hosting Infrastructure. Over 80% of IP addresses used to host malicious domains on a given day were found to be reused from the previous 7 days.
  • Figure 3: Overall pipeline for daily blocklist generation.
  • Figure 4: On-demand detection of malicious domains.
  • Figure 5: Graph schema.
  • ...and 15 more figures