Table of Contents
Fetching ...

Policy-Value Guided MDP-MCTS Framework for Cyber Kill-Chain Inference

Chitraksh Singh, Monisha Dhanraj, Ken Huang

TL;DR

The paper addresses the challenge of reconstructing complete, ATT&CK-aligned kill chains from unstructured threat reports. It presents a unified framework that fuses Transformer-derived semantic priors with a symbolic MDP and an AlphaZero-style Monte Carlo Tree Search guided by a Policy–Value Network to infer seven-phase kill chains. Key contributions include a multi-objective reward capturing semantic, structural, and defensive factors; a context-aware PVN; and an MCTS inference procedure that yields coherent, interpretable attack narratives. Empirical results on three real intrusions show improved semantic fidelity and multi-phase coherence over Transformer baselines, with predictions aligning closely to expert-picked techniques. The approach offers a scalable, transparent path toward automated, analyst-friendly kill-chain reconstruction for cyber defense.

Abstract

Threat analysts routinely rely on natural-language reports that describe attacker actions without enumerating the full kill chain or the dependencies between phases, making automated reconstruction of ATT&CK consistent intrusion paths a difficult open problem. We propose a reasoning framework that infers complete seven-phase kill chains by coupling phase-conditioned semantic priors from Transformer models with a symbolic Markov Decision Process and an AlphaZero-style Monte Carlo Tree Search guided by a Policy-Value Network. The framework enforces semantic relevance, phase cohesion, and transition plausibility through a multi-objective reward function while allowing search to explore alternative interpretations of the CTI narrative. Applied to three real intrusions FIN6, APT24, and UNC1549 the approach yields kill chains that surpass Transformer baselines in semantic fidelity and operational coherence, and frequently align with expert-selected TTPs. Our results demonstrate that combining contextual embeddings with search-based decision-making offers a practical path toward automated, interpretable kill-chain reconstruction for cyber defense.

Policy-Value Guided MDP-MCTS Framework for Cyber Kill-Chain Inference

TL;DR

The paper addresses the challenge of reconstructing complete, ATT&CK-aligned kill chains from unstructured threat reports. It presents a unified framework that fuses Transformer-derived semantic priors with a symbolic MDP and an AlphaZero-style Monte Carlo Tree Search guided by a Policy–Value Network to infer seven-phase kill chains. Key contributions include a multi-objective reward capturing semantic, structural, and defensive factors; a context-aware PVN; and an MCTS inference procedure that yields coherent, interpretable attack narratives. Empirical results on three real intrusions show improved semantic fidelity and multi-phase coherence over Transformer baselines, with predictions aligning closely to expert-picked techniques. The approach offers a scalable, transparent path toward automated, analyst-friendly kill-chain reconstruction for cyber defense.

Abstract

Threat analysts routinely rely on natural-language reports that describe attacker actions without enumerating the full kill chain or the dependencies between phases, making automated reconstruction of ATT&CK consistent intrusion paths a difficult open problem. We propose a reasoning framework that infers complete seven-phase kill chains by coupling phase-conditioned semantic priors from Transformer models with a symbolic Markov Decision Process and an AlphaZero-style Monte Carlo Tree Search guided by a Policy-Value Network. The framework enforces semantic relevance, phase cohesion, and transition plausibility through a multi-objective reward function while allowing search to explore alternative interpretations of the CTI narrative. Applied to three real intrusions FIN6, APT24, and UNC1549 the approach yields kill chains that surpass Transformer baselines in semantic fidelity and operational coherence, and frequently align with expert-selected TTPs. Our results demonstrate that combining contextual embeddings with search-based decision-making offers a practical path toward automated, interpretable kill-chain reconstruction for cyber defense.

Paper Structure

This paper contains 13 sections, 26 equations, 4 figures, 1 table, 3 algorithms.

Figures (4)

  • Figure 1: Proposed Cyber Kill-Chain Reasoning Framework.
  • Figure 2: Reasoning pipeline combining semantic encoding, multi-objective rewards, MDP rollouts, PV-Network predictions, and MCTS inference.
  • Figure 3: Illustrative MCTS expansion and backtracking trace for the FIN6 report. The diagram shows the partial search trajectory from Recon to Delivery [Iteration 1], highlighting priors and rewards influence node selection.
  • Figure 4: Behavioral envelope visualizations for threat actors FIN6, APT24, and UNC1549.