Optimizing Cyber Defense in Dynamic Active Directories through Reinforcement Learning
Diksha Goel, Kristen Moore, Mingyu Guo, Derui Wang, Minjune Kim, Seyit Camtepe
TL;DR
This paper tackles the problem of defending dynamic Active Directory graphs against autonomous attackers by formulating a Stackelberg game between a generalized RL attacker and an RL-assisted evolutionary defender. It introduces an AD graph optimization step and a training facilitator that prunes both environments and neural networks to enable scalable learning, paired with a reinforcement-learning defender using evolutionary diversity optimization to produce robust edge-blocking plans. The attacker is trained via Proximal Policy Optimization across multiple graph snapshots, while the defender evolves diverse defenses under a fixed budget, evaluated against the learned attacker critic network. Empirical results on synthetic AD graphs up to $r4000$ nodes show that the GenRL-TrnF attacker policy generalizes effectively and the RL-EDO defender yields lower attacker success rates than baselines, indicating strong potential for scalable, automated cyber defense in large, time-varying AD environments.
Abstract
This paper addresses a significant gap in Autonomous Cyber Operations (ACO) literature: the absence of effective edge-blocking ACO strategies in dynamic, real-world networks. It specifically targets the cybersecurity vulnerabilities of organizational Active Directory (AD) systems. Unlike the existing literature on edge-blocking defenses which considers AD systems as static entities, our study counters this by recognizing their dynamic nature and developing advanced edge-blocking defenses through a Stackelberg game model between attacker and defender. We devise a Reinforcement Learning (RL)-based attack strategy and an RL-assisted Evolutionary Diversity Optimization-based defense strategy, where the attacker and defender improve each other strategy via parallel gameplay. To address the computational challenges of training attacker-defender strategies on numerous dynamic AD graphs, we propose an RL Training Facilitator that prunes environments and neural networks to eliminate irrelevant elements, enabling efficient and scalable training for large graphs. We extensively train the attacker strategy, as a sophisticated attacker model is essential for a robust defense. Our empirical results successfully demonstrate that our proposed approach enhances defender's proficiency in hardening dynamic AD graphs while ensuring scalability for large-scale AD.
