A Factored MDP Approach To Moving Target Defense With Dynamic Threat Modeling and Cost Efficiency
Megha Bose, Praveen Paruchuri, Akshat Kumar
TL;DR
This paper addresses cyber defense under Moving Target Defense when attacker payoffs are unknown. It introduces Adaptive Threat-Aware Factored MDP (ATA-FMDP), coupling a dynamic Bayesian network of attacker responses with a factored MDP and solving via Approximate Linear Programming to produce scalable, switching-cost–aware policies. A theoretical negative result shows sublinear policy regret is impossible against adaptive adversaries when multiple configurations exist, while empirical evaluations in web-application and network domains demonstrate substantial gains over baselines. The approach enables robust adaptation to evolving threat landscapes by online updating attacker-type probabilities and integrating them into defender planning, offering practical impact for real-world MTD deployments.
Abstract
Moving Target Defense (MTD) has emerged as a proactive and dynamic framework to counteract evolving cyber threats. Traditional MTD approaches often rely on assumptions about the attackers knowledge and behavior. However, real-world scenarios are inherently more complex, with adaptive attackers and limited prior knowledge of their payoffs and intentions. This paper introduces a novel approach to MTD using a Markov Decision Process (MDP) model that does not rely on predefined attacker payoffs. Our framework integrates the attackers real-time responses into the defenders MDP using a dynamic Bayesian Network. By employing a factored MDP model, we provide a comprehensive and realistic system representation. We also incorporate incremental updates to an attack response predictor as new data emerges. This ensures an adaptive and robust defense mechanism. Additionally, we consider the costs of switching configurations in MTD, integrating them into the reward structure to balance execution and defense costs. We first highlight the challenges of the problem through a theoretical negative result on regret. However, empirical evaluations demonstrate the frameworks effectiveness in scenarios marked by high uncertainty and dynamically changing attack landscapes.
