Policy-Value Guided MDP-MCTS Framework for Cyber Kill-Chain Inference

Chitraksh Singh; Monisha Dhanraj; Ken Huang

Policy-Value Guided MDP-MCTS Framework for Cyber Kill-Chain Inference

Chitraksh Singh, Monisha Dhanraj, Ken Huang

TL;DR

The paper addresses the challenge of reconstructing complete, ATT&CK-aligned kill chains from unstructured threat reports. It presents a unified framework that fuses Transformer-derived semantic priors with a symbolic MDP and an AlphaZero-style Monte Carlo Tree Search guided by a Policy–Value Network to infer seven-phase kill chains. Key contributions include a multi-objective reward capturing semantic, structural, and defensive factors; a context-aware PVN; and an MCTS inference procedure that yields coherent, interpretable attack narratives. Empirical results on three real intrusions show improved semantic fidelity and multi-phase coherence over Transformer baselines, with predictions aligning closely to expert-picked techniques. The approach offers a scalable, transparent path toward automated, analyst-friendly kill-chain reconstruction for cyber defense.

Abstract

Threat analysts routinely rely on natural-language reports that describe attacker actions without enumerating the full kill chain or the dependencies between phases, making automated reconstruction of ATT&CK consistent intrusion paths a difficult open problem. We propose a reasoning framework that infers complete seven-phase kill chains by coupling phase-conditioned semantic priors from Transformer models with a symbolic Markov Decision Process and an AlphaZero-style Monte Carlo Tree Search guided by a Policy-Value Network. The framework enforces semantic relevance, phase cohesion, and transition plausibility through a multi-objective reward function while allowing search to explore alternative interpretations of the CTI narrative. Applied to three real intrusions FIN6, APT24, and UNC1549 the approach yields kill chains that surpass Transformer baselines in semantic fidelity and operational coherence, and frequently align with expert-selected TTPs. Our results demonstrate that combining contextual embeddings with search-based decision-making offers a practical path toward automated, interpretable kill-chain reconstruction for cyber defense.

Policy-Value Guided MDP-MCTS Framework for Cyber Kill-Chain Inference

TL;DR

Abstract

Policy-Value Guided MDP-MCTS Framework for Cyber Kill-Chain Inference

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (4)