Technique Inference Engine: A Recommender Model to Support Cyber Threat Hunting
Matthew J. Turner, Mike Carenzo, Jackie Lasky, James Morris-King, James Ross
TL;DR
The paper tackles proactive cyber threat hunting by reframing observed ATT&CK techniques as a campaign–technique implicit-feedback recommender problem. It introduces Technique Inference Engine (TIE), builds the largest public CTI-ATT&CK dataset to date, and compares Weighted Matrix Factorization (WMF) and Bayesian Personalized Ranking (BPR) as explicit implicit-feedback approaches, with WMF showing stronger or comparable performance across recall@K and $NDCG@K$. The authors provide a web interface for analysts to input observed techniques and obtain a ranked list of likely additional techniques, along with in-browser WMF computations and t-SNE visualizations of report embeddings. The approach demonstrates practical value by achieving recall@20 around $0.40$ and enabling rapid, targeted threat-hunting suggestions, illustrating a path toward scalable, data-driven adversity characterization. This work contributes a deployable tool and a dataset that can support more proactive threat hunting through data-driven inference of related TTPs.
Abstract
Cyber threat hunting is the practice of proactively searching for latent threats in a network. Engaging in threat hunting can be difficult due to the volume of network traffic, variety of adversary techniques, and constantly evolving vulnerabilities. To aid analysts in identifying techniques which may be co-occurring as part of a campaign, we present the Technique Inference Engine, a tool to infer tactics, techniques, and procedures (TTPs) which may be related to existing observations of adversarial behavior. We compile the largest (to our knowledge) available dataset of cyber threat intelligence (CTI) reports labeled with relevant TTPs. With the knowledge that techniques are chronically under-reported in CTI, we apply several implicit feedback recommender models to the data in order to predict additional techniques which may be part of a given campaign. We evaluate the results in the context of the cyber analyst's use case and apply t-SNE to visualize the model embeddings. We provide our code and a web interface.
