Table of Contents
Fetching ...

DeepStage: Learning Autonomous Defense Policies Against Multi-Stage APT Campaigns

Trung V. Phan, Tri Gia Nguyen, Thomas Bauschert

Abstract

This paper presents DeepStage, a deep reinforcement learning (DRL) framework for adaptive, stage-aware defense against Advanced Persistent Threats (APTs). The enterprise environment is modeled as a partially observable Markov decision process (POMDP), where host provenance and network telemetry are fused into unified provenance graphs. Building on our prior work, StageFinder, a graph neural encoder and an LSTM-based stage estimator infer probabilistic attacker stages aligned with the MITRE ATT&CK framework. These stage beliefs, combined with graph embeddings, guide a hierarchical Proximal Policy Optimization (PPO) agent that selects defense actions across monitoring, access control, containment, and remediation. Evaluated in a realistic enterprise testbed using CALDERA-driven APT playbooks, DeepStage achieves a stage-weighted F1-score of 0.89, outperforming a risk-aware DRL baseline by 21.9%. The results demonstrate effective stage-aware and cost-efficient autonomous cyber defense.

DeepStage: Learning Autonomous Defense Policies Against Multi-Stage APT Campaigns

Abstract

This paper presents DeepStage, a deep reinforcement learning (DRL) framework for adaptive, stage-aware defense against Advanced Persistent Threats (APTs). The enterprise environment is modeled as a partially observable Markov decision process (POMDP), where host provenance and network telemetry are fused into unified provenance graphs. Building on our prior work, StageFinder, a graph neural encoder and an LSTM-based stage estimator infer probabilistic attacker stages aligned with the MITRE ATT&CK framework. These stage beliefs, combined with graph embeddings, guide a hierarchical Proximal Policy Optimization (PPO) agent that selects defense actions across monitoring, access control, containment, and remediation. Evaluated in a realistic enterprise testbed using CALDERA-driven APT playbooks, DeepStage achieves a stage-weighted F1-score of 0.89, outperforming a risk-aware DRL baseline by 21.9%. The results demonstrate effective stage-aware and cost-efficient autonomous cyber defense.
Paper Structure (38 sections, 7 equations, 5 figures, 2 tables)

This paper contains 38 sections, 7 equations, 5 figures, 2 tables.

Figures (5)

  • Figure 1: Data and control flow of the proposed DeepStage framework.
  • Figure 2: Per-stage defense effectiveness measured by Stage-weighted F1-score across the six APT phases.
  • Figure 3: Cost–effectiveness frontiers illustrating normalized security gain versus cumulative action cost.
  • Figure 4: Training convergence of hierarchical PPO across methods.
  • Figure 5: Defense responsiveness over APT stage transitions.