DeepStage: Learning Autonomous Defense Policies Against Multi-Stage APT Campaigns

Trung V. Phan; Tri Gia Nguyen; Thomas Bauschert

DeepStage: Learning Autonomous Defense Policies Against Multi-Stage APT Campaigns

Trung V. Phan, Tri Gia Nguyen, Thomas Bauschert

Abstract

This paper presents DeepStage, a deep reinforcement learning (DRL) framework for adaptive, stage-aware defense against Advanced Persistent Threats (APTs). The enterprise environment is modeled as a partially observable Markov decision process (POMDP), where host provenance and network telemetry are fused into unified provenance graphs. Building on our prior work, StageFinder, a graph neural encoder and an LSTM-based stage estimator infer probabilistic attacker stages aligned with the MITRE ATT&CK framework. These stage beliefs, combined with graph embeddings, guide a hierarchical Proximal Policy Optimization (PPO) agent that selects defense actions across monitoring, access control, containment, and remediation. Evaluated in a realistic enterprise testbed using CALDERA-driven APT playbooks, DeepStage achieves a stage-weighted F1-score of 0.89, outperforming a risk-aware DRL baseline by 21.9%. The results demonstrate effective stage-aware and cost-efficient autonomous cyber defense.

DeepStage: Learning Autonomous Defense Policies Against Multi-Stage APT Campaigns

Abstract

Paper Structure (38 sections, 7 equations, 5 figures, 2 tables)

This paper contains 38 sections, 7 equations, 5 figures, 2 tables.

Introduction
Related Work
DRL-based Autonomous Cyber Defense
Risk-Aware DRL and Attack-Graph-Based Defense
Hierarchical and Constrained Reinforcement Learning
Limitations of Existing DRL-based Defense Systems
Summary and Research Gap
Design of The DeepStage Framework
Network Environment
Operation of the DeepStage Framework
Data Acquisition
Provenance Graph Construction
Graph Embedding and Stage Estimation
Hierarchical DRL Defense
Feedback and Learning
...and 23 more sections

Figures (5)

Figure 1: Data and control flow of the proposed DeepStage framework.
Figure 2: Per-stage defense effectiveness measured by Stage-weighted F1-score across the six APT phases.
Figure 3: Cost–effectiveness frontiers illustrating normalized security gain versus cumulative action cost.
Figure 4: Training convergence of hierarchical PPO across methods.
Figure 5: Defense responsiveness over APT stage transitions.

DeepStage: Learning Autonomous Defense Policies Against Multi-Stage APT Campaigns

Abstract

DeepStage: Learning Autonomous Defense Policies Against Multi-Stage APT Campaigns

Authors

Abstract

Table of Contents

Figures (5)