Table of Contents
Fetching ...

Learning the APT Kill Chain: Temporal Reasoning over Provenance Data for Attack Stage Estimation

Trung V. Phan, Thomas Bauschert

TL;DR

StageFinder is presented, a temporal graph learning framework for multi-stage attack progression inference from fused host and network provenance data that achieves a macro F1-score of 0.96 and reduces prediction volatility by 31 percent compared to state-of-the-art baselines.

Abstract

Advanced Persistent Threats (APTs) evolve through multiple stages, each exhibiting distinct temporal and structural behaviors. Accurate stage estimation is critical for enabling adaptive cyber defense. This paper presents StageFinder, a temporal graph learning framework for multi-stage attack progression inference from fused host and network provenance data. Provenance graphs are encoded using a graph neural network to capture structural dependencies among processes, files, and connections, while a long short-term memory (LSTM) model learns temporal dynamics to estimate stage probabilities aligned with the MITRE ATT&CK framework. The model is pretrained on the DARPA OpTC dataset and fine-tuned on labeled DARPA Transparent Computing data. Experimental results demonstrate that StageFinder achieves a macro F1-score of 0.96 and reduces prediction volatility by 31 percent compared to state-of-the-art baselines (Cyberian, NetGuardian). These results highlight the effectiveness of fused provenance and temporal learning for accurate and stable APT stage inference.

Learning the APT Kill Chain: Temporal Reasoning over Provenance Data for Attack Stage Estimation

TL;DR

StageFinder is presented, a temporal graph learning framework for multi-stage attack progression inference from fused host and network provenance data that achieves a macro F1-score of 0.96 and reduces prediction volatility by 31 percent compared to state-of-the-art baselines.

Abstract

Advanced Persistent Threats (APTs) evolve through multiple stages, each exhibiting distinct temporal and structural behaviors. Accurate stage estimation is critical for enabling adaptive cyber defense. This paper presents StageFinder, a temporal graph learning framework for multi-stage attack progression inference from fused host and network provenance data. Provenance graphs are encoded using a graph neural network to capture structural dependencies among processes, files, and connections, while a long short-term memory (LSTM) model learns temporal dynamics to estimate stage probabilities aligned with the MITRE ATT&CK framework. The model is pretrained on the DARPA OpTC dataset and fine-tuned on labeled DARPA Transparent Computing data. Experimental results demonstrate that StageFinder achieves a macro F1-score of 0.96 and reduces prediction volatility by 31 percent compared to state-of-the-art baselines (Cyberian, NetGuardian). These results highlight the effectiveness of fused provenance and temporal learning for accurate and stable APT stage inference.
Paper Structure (26 sections, 4 equations, 4 figures, 4 tables)

This paper contains 26 sections, 4 equations, 4 figures, 4 tables.

Figures (4)

  • Figure 1: An example of an APT attack towards an enterprise network.
  • Figure 2: Data and control flow of the StageFinder framework. Host logs and network alerts are fused into a provenance graph, encoded by a GNN, and analyzed by an LSTM-based Stage Estimator, with the Attack Stage Mapping producing discrete APT stages.
  • Figure 3: An example of a provenance graph construction with the early fusion.
  • Figure 4: Temporal attention comparison between Cyberian and StageFinder LSTM models.