Optimizing Cyber Defense in Dynamic Active Directories through Reinforcement Learning

Diksha Goel; Kristen Moore; Mingyu Guo; Derui Wang; Minjune Kim; Seyit Camtepe

Optimizing Cyber Defense in Dynamic Active Directories through Reinforcement Learning

Diksha Goel, Kristen Moore, Mingyu Guo, Derui Wang, Minjune Kim, Seyit Camtepe

TL;DR

This paper tackles the problem of defending dynamic Active Directory graphs against autonomous attackers by formulating a Stackelberg game between a generalized RL attacker and an RL-assisted evolutionary defender. It introduces an AD graph optimization step and a training facilitator that prunes both environments and neural networks to enable scalable learning, paired with a reinforcement-learning defender using evolutionary diversity optimization to produce robust edge-blocking plans. The attacker is trained via Proximal Policy Optimization across multiple graph snapshots, while the defender evolves diverse defenses under a fixed budget, evaluated against the learned attacker critic network. Empirical results on synthetic AD graphs up to $r4000$ nodes show that the GenRL-TrnF attacker policy generalizes effectively and the RL-EDO defender yields lower attacker success rates than baselines, indicating strong potential for scalable, automated cyber defense in large, time-varying AD environments.

Abstract

This paper addresses a significant gap in Autonomous Cyber Operations (ACO) literature: the absence of effective edge-blocking ACO strategies in dynamic, real-world networks. It specifically targets the cybersecurity vulnerabilities of organizational Active Directory (AD) systems. Unlike the existing literature on edge-blocking defenses which considers AD systems as static entities, our study counters this by recognizing their dynamic nature and developing advanced edge-blocking defenses through a Stackelberg game model between attacker and defender. We devise a Reinforcement Learning (RL)-based attack strategy and an RL-assisted Evolutionary Diversity Optimization-based defense strategy, where the attacker and defender improve each other strategy via parallel gameplay. To address the computational challenges of training attacker-defender strategies on numerous dynamic AD graphs, we propose an RL Training Facilitator that prunes environments and neural networks to eliminate irrelevant elements, enabling efficient and scalable training for large graphs. We extensively train the attacker strategy, as a sophisticated attacker model is essential for a robust defense. Our empirical results successfully demonstrate that our proposed approach enhances defender's proficiency in hardening dynamic AD graphs while ensuring scalability for large-scale AD.

Optimizing Cyber Defense in Dynamic Active Directories through Reinforcement Learning

TL;DR

nodes show that the GenRL-TrnF attacker policy generalizes effectively and the RL-EDO defender yields lower attacker success rates than baselines, indicating strong potential for scalable, automated cyber defense in large, time-varying AD environments.

Abstract

Paper Structure (17 sections, 7 equations, 3 figures, 3 tables)

This paper contains 17 sections, 7 equations, 3 figures, 3 tables.

Introduction
Related Work
Problem Description
Proposed Attacker-Defender Approach
Proposed AD Graph Optimization Technique
Attacker Approach: Reinforcement Learning
RL Training Facilitator: Pruning Approaches
Defender's Approach: Reinforcement Learning Assisted Evolutionary Diversity Optimization
Overall Attacker-Defender Approach
Experimental Results
Synthetic AD Graph Dataset
Training Parameters
Attacker-Defender Policy Training
Evaluating Attacker's Policy
Evaluating Defender's Policy
...and 2 more sections

Figures (3)

Figure 1: AD attack graph containing 500 computers.
Figure 2: Proposed RL-based Attacker-Defender Approach for Dynamic Networks.
Figure 3: Comparison of deviation from 50 specialized agents across various attacker policies (smaller deviations indicate superior performance).

Theorems & Definitions (4)

Definition 1
Definition 2
Definition 3
Definition 4

Optimizing Cyber Defense in Dynamic Active Directories through Reinforcement Learning

TL;DR

Abstract

Optimizing Cyber Defense in Dynamic Active Directories through Reinforcement Learning

Authors

TL;DR

Abstract

Table of Contents

Figures (3)

Theorems & Definitions (4)