Table of Contents
Fetching ...

Hesperus is Phosphorus: Mapping Threat Actor Naming Taxonomies at Scale

Gonzalo Roa, Manuel Suarez-Roman, Juan Tapiador

TL;DR

This paper tackles the pervasive problem of inconsistent threat actor naming across CTI vendors by introducing HiP, a scalable framework that normalizes, integrates, and clusters TA names from multiple taxonomies. HiP constructs a Threat Actor Name Alias Graph (TANAG) from 15 sources, revealing that alias proliferation is highly concentrated on a small set of TAs and correlates strongly with reporting activity and vendor coverage. The longitudinal analysis shows alias growth over 2000–2025, with notable surges in 2012–2018 and post-2020, while highlighting numerous data-quality pitfalls that inflate clusters. The work also discusses the feasibility of a universal TA naming standard, emphasizing the central barrier of private telemetry and proposing that curated mappings and tools like HiP can still provide value if used with appropriate caveats.

Abstract

This paper studies the problem of Threat Actor (TA) naming convention inconsistency across leading Cyber Threat Intelligence (CTI) vendors. The current decentralized and proprietary nomenclature creates confusion and significant obstacles for researchers, including difficulties in integrating and correlating disparate CTI reports and TA profiles. This paper introduces HiP (Hesperus is Phosphorus, a reference to the classic question about the Morning and the Evening Star), a methodology for normalizing, integrating, and clustering TA names presumably corresponding to the same entity. Using HiP, we analyze a large dataset collected from 15 sources and spanning 13,371 CTI reports, 17 vendor taxonomies, 3,287 TA names, and 8 mappings between them. Our analysis of the resulting name graph provides insights on key features of the problem, such as the concentration of aliases on a relatively small subset of TAs, the evolution of this phenomenon over the years, and the factors that could explain TA name proliferation. We also report errors in the mappings and methodological pitfalls that contribute to make certain TA name clusters larger than they should be, including the use of temporary names for activity clusters, the existence of common tools and infrastructure, and overlapping operations. We conclude with a discussion on the inherent difficulties to adopt a TA naming standard, a quest fundamentally hampered by the need to share highly-sensitive telemetry that is private to each CTI vendor.

Hesperus is Phosphorus: Mapping Threat Actor Naming Taxonomies at Scale

TL;DR

This paper tackles the pervasive problem of inconsistent threat actor naming across CTI vendors by introducing HiP, a scalable framework that normalizes, integrates, and clusters TA names from multiple taxonomies. HiP constructs a Threat Actor Name Alias Graph (TANAG) from 15 sources, revealing that alias proliferation is highly concentrated on a small set of TAs and correlates strongly with reporting activity and vendor coverage. The longitudinal analysis shows alias growth over 2000–2025, with notable surges in 2012–2018 and post-2020, while highlighting numerous data-quality pitfalls that inflate clusters. The work also discusses the feasibility of a universal TA naming standard, emphasizing the central barrier of private telemetry and proposing that curated mappings and tools like HiP can still provide value if used with appropriate caveats.

Abstract

This paper studies the problem of Threat Actor (TA) naming convention inconsistency across leading Cyber Threat Intelligence (CTI) vendors. The current decentralized and proprietary nomenclature creates confusion and significant obstacles for researchers, including difficulties in integrating and correlating disparate CTI reports and TA profiles. This paper introduces HiP (Hesperus is Phosphorus, a reference to the classic question about the Morning and the Evening Star), a methodology for normalizing, integrating, and clustering TA names presumably corresponding to the same entity. Using HiP, we analyze a large dataset collected from 15 sources and spanning 13,371 CTI reports, 17 vendor taxonomies, 3,287 TA names, and 8 mappings between them. Our analysis of the resulting name graph provides insights on key features of the problem, such as the concentration of aliases on a relatively small subset of TAs, the evolution of this phenomenon over the years, and the factors that could explain TA name proliferation. We also report errors in the mappings and methodological pitfalls that contribute to make certain TA name clusters larger than they should be, including the use of temporary names for activity clusters, the existence of common tools and infrastructure, and overlapping operations. We conclude with a discussion on the inherent difficulties to adopt a TA naming standard, a quest fundamentally hampered by the need to share highly-sensitive telemetry that is private to each CTI vendor.

Paper Structure

This paper contains 27 sections, 1 equation, 11 figures, 4 tables.

Figures (11)

  • Figure 1: HiP architecture.
  • Figure 2: Threat Actor Name Alias Graph (TANAG) produced by HiP.
  • Figure 3: Cumulative distribution function of the TA alias cluster sizes.
  • Figure 4: Malware Intelligence Gain (MIG) vs. subcluster size.
  • Figure 5: Cumulative distribution function of the studied TA features vs. the size of the TA alias cluster.
  • ...and 6 more figures