Table of Contents
Fetching ...

The CTI Echo Chamber: Fragmentation, Overlap, and Vendor Specificity in Twenty Years of Cyber Threat Reporting

Manuel Suarez-Roman, Francesco Marciori, Mauro Conti, Juan Tapiador

TL;DR

A large-scale automated analysis of open-source CTI reports spanning two decades is presented, developing a high-precision, LLM-based pipeline to ingest and structure 13,308 reports, extracting key entities such as attributed threat actors, motivations, victims, reporting vendors, and technical indicators.

Abstract

Despite the high volume of open-source Cyber Threat Intelligence (CTI), our understanding of long-term threat actor-victim dynamics remains fragmented due to the lack of structured datasets and inconsistent reporting standards. In this paper, we present a large-scale automated analysis of open-source CTI reports spanning two decades. We develop a high-precision, LLM-based pipeline to ingest and structure 13,308 reports, extracting key entities such as attributed threat actors, motivations, victims, reporting vendors, and technical indicators (IoCs and TTPs). Our analysis quantifies the evolution of CTI information density and specialization, characterizing patterns that relate specific threat actors to motivations and victim profiles. Furthermore, we perform a meta-analysis of the CTI industry itself. We identify a fragmented ecosystem of distinct silos where vendors demonstrate significant geographic and sectoral reporting biases. Our marginal coverage analysis reveals that intelligence overlap between vendors is typically low: while a few core providers may offer broad situational awareness, additional sources yield diminishing returns. Overall, our findings characterize the structural biases inherent in the CTI ecosystem, enabling practitioners and researchers to better evaluate the completeness of their intelligence sources.

The CTI Echo Chamber: Fragmentation, Overlap, and Vendor Specificity in Twenty Years of Cyber Threat Reporting

TL;DR

A large-scale automated analysis of open-source CTI reports spanning two decades is presented, developing a high-precision, LLM-based pipeline to ingest and structure 13,308 reports, extracting key entities such as attributed threat actors, motivations, victims, reporting vendors, and technical indicators.

Abstract

Despite the high volume of open-source Cyber Threat Intelligence (CTI), our understanding of long-term threat actor-victim dynamics remains fragmented due to the lack of structured datasets and inconsistent reporting standards. In this paper, we present a large-scale automated analysis of open-source CTI reports spanning two decades. We develop a high-precision, LLM-based pipeline to ingest and structure 13,308 reports, extracting key entities such as attributed threat actors, motivations, victims, reporting vendors, and technical indicators (IoCs and TTPs). Our analysis quantifies the evolution of CTI information density and specialization, characterizing patterns that relate specific threat actors to motivations and victim profiles. Furthermore, we perform a meta-analysis of the CTI industry itself. We identify a fragmented ecosystem of distinct silos where vendors demonstrate significant geographic and sectoral reporting biases. Our marginal coverage analysis reveals that intelligence overlap between vendors is typically low: while a few core providers may offer broad situational awareness, additional sources yield diminishing returns. Overall, our findings characterize the structural biases inherent in the CTI ecosystem, enabling practitioners and researchers to better evaluate the completeness of their intelligence sources.
Paper Structure (26 sections, 11 figures, 11 tables)

This paper contains 26 sections, 11 figures, 11 tables.

Figures (11)

  • Figure 1: CTIRep methodology and structure of the study.
  • Figure 2: Temporal evolution of the volume, diversity, and distribution of report types present in CTIRep.
  • Figure 3: Distribution of the top 25 geographies (bottom to top) attacked by the top 25 threat actors (left to right). The bubble size and color represents no. reports and no. different business sectors of the victims, respectively.
  • Figure 4: Sankey diagram of the distribution of the attack motivations, the victim's business sectors and the most attacked countries. Only motivations with a count higher than 50 and sectors with a count higher than 150 are considered. Acronyms are described in Table \ref{['table:taxonomies']}.
  • Figure 5: Number of reports (total count, reports including TTPs, and reports including IoCs) for the top 30 vendors.
  • ...and 6 more figures