Table of Contents
Fetching ...

Technique Inference Engine: A Recommender Model to Support Cyber Threat Hunting

Matthew J. Turner, Mike Carenzo, Jackie Lasky, James Morris-King, James Ross

TL;DR

The paper tackles proactive cyber threat hunting by reframing observed ATT&CK techniques as a campaign–technique implicit-feedback recommender problem. It introduces Technique Inference Engine (TIE), builds the largest public CTI-ATT&CK dataset to date, and compares Weighted Matrix Factorization (WMF) and Bayesian Personalized Ranking (BPR) as explicit implicit-feedback approaches, with WMF showing stronger or comparable performance across recall@K and $NDCG@K$. The authors provide a web interface for analysts to input observed techniques and obtain a ranked list of likely additional techniques, along with in-browser WMF computations and t-SNE visualizations of report embeddings. The approach demonstrates practical value by achieving recall@20 around $0.40$ and enabling rapid, targeted threat-hunting suggestions, illustrating a path toward scalable, data-driven adversity characterization. This work contributes a deployable tool and a dataset that can support more proactive threat hunting through data-driven inference of related TTPs.

Abstract

Cyber threat hunting is the practice of proactively searching for latent threats in a network. Engaging in threat hunting can be difficult due to the volume of network traffic, variety of adversary techniques, and constantly evolving vulnerabilities. To aid analysts in identifying techniques which may be co-occurring as part of a campaign, we present the Technique Inference Engine, a tool to infer tactics, techniques, and procedures (TTPs) which may be related to existing observations of adversarial behavior. We compile the largest (to our knowledge) available dataset of cyber threat intelligence (CTI) reports labeled with relevant TTPs. With the knowledge that techniques are chronically under-reported in CTI, we apply several implicit feedback recommender models to the data in order to predict additional techniques which may be part of a given campaign. We evaluate the results in the context of the cyber analyst's use case and apply t-SNE to visualize the model embeddings. We provide our code and a web interface.

Technique Inference Engine: A Recommender Model to Support Cyber Threat Hunting

TL;DR

The paper tackles proactive cyber threat hunting by reframing observed ATT&CK techniques as a campaign–technique implicit-feedback recommender problem. It introduces Technique Inference Engine (TIE), builds the largest public CTI-ATT&CK dataset to date, and compares Weighted Matrix Factorization (WMF) and Bayesian Personalized Ranking (BPR) as explicit implicit-feedback approaches, with WMF showing stronger or comparable performance across recall@K and . The authors provide a web interface for analysts to input observed techniques and obtain a ranked list of likely additional techniques, along with in-browser WMF computations and t-SNE visualizations of report embeddings. The approach demonstrates practical value by achieving recall@20 around and enabling rapid, targeted threat-hunting suggestions, illustrating a path toward scalable, data-driven adversity characterization. This work contributes a deployable tool and a dataset that can support more proactive threat hunting through data-driven inference of related TTPs.

Abstract

Cyber threat hunting is the practice of proactively searching for latent threats in a network. Engaging in threat hunting can be difficult due to the volume of network traffic, variety of adversary techniques, and constantly evolving vulnerabilities. To aid analysts in identifying techniques which may be co-occurring as part of a campaign, we present the Technique Inference Engine, a tool to infer tactics, techniques, and procedures (TTPs) which may be related to existing observations of adversarial behavior. We compile the largest (to our knowledge) available dataset of cyber threat intelligence (CTI) reports labeled with relevant TTPs. With the knowledge that techniques are chronically under-reported in CTI, we apply several implicit feedback recommender models to the data in order to predict additional techniques which may be part of a given campaign. We evaluate the results in the context of the cyber analyst's use case and apply t-SNE to visualize the model embeddings. We provide our code and a web interface.

Paper Structure

This paper contains 15 sections, 5 equations, 2 figures, 5 tables.

Figures (2)

  • Figure 1: t-SNE visualization of WMF report embeddings using the cosine distance metric and a perplexity of 30. Colors are based on clusters of the data using MeanShift with a bandwidth of 10mean_shift.
  • Figure 2: Technique Inference Engine web interface. Users may add observed techniques using the "ADD TECHNIQUE" box and retrieve a list of predicted technqiues which may be sorted, filtered, and exported to an ATT&CK Navigator layer or .csv fileattack_navigator. In this example, the Technique Inference Engine is able to predict 6 additional techniques present in the MITRE NERVE breach from an input of four techniquesnerve. Note that not all predicted techniques are shown.