The Path To Autonomous Cyber Defense

Sean Oesch; Phillipe Austria; Amul Chaulagain; Brian Weber; Cory Watson; Matthew Dixson; Amir Sadovnik

The Path To Autonomous Cyber Defense

Sean Oesch, Phillipe Austria, Amul Chaulagain, Brian Weber, Cory Watson, Matthew Dixson, Amir Sadovnik

TL;DR

Defenders are overwhelmed by attack volume, and AI-enabled attackers threaten to outpace humans. The paper outlines a path to autonomous cyber defense using multi-agent reinforcement learning, where specialized agents automate stages of the cyber defense life cycle. It discusses critical design choices, including playing the right game (observation, rewards, actions), enabling adaptability to changing networks and adversaries, and building high-fidelity, reusable training environments that combine simulation and emulation. The findings argue that modular, detector-based observation signals and dynamic reward shaping can improve transferability to real networks, and that standardized training platforms are essential for progress.

Abstract

Defenders are overwhelmed by the number and scale of attacks against their networks.This problem will only be exacerbated as attackers leverage artificial intelligence to automate their workflows. We propose a path to autonomous cyber agents able to augment defenders by automating critical steps in the cyber defense life cycle.

The Path To Autonomous Cyber Defense

TL;DR

Abstract

Paper Structure (6 sections, 3 figures)

This paper contains 6 sections, 3 figures.

THE CYBER DEFENSE LIFE CYCLE
PLAYING THE RIGHT GAME
ADAPTABILITY IS KEY
Better Training Environments
CONCLUSION
ACKNOWLEDGMENTS

Figures (3)

Figure 1: When training an autonomous agent, the probabilities of observing different attacker behaviors need to match real life probabilities or else the agent may fail in a real environment. The dotted blue line utilized what we considered realistic detection probabilities for specific red agent actions (50% for adding a new user, 15% for adding a process, 5% adding new session, etc.) Defining observation probabilities in only a small part of creating the appropriate game environment.
Figure 2: The new detector based observation implementation with a perfect detector outperforms the baseline (original CybORG observation space) over 100M training steps with 10 subnets, demonstrating the feasibility of utilizing an alternate observation space that avoids replicating existing tools and is easier to use in existing SOCs. This new observation space represents a more usable action space both in terms of integration for SOCs and from the perspective of the reinforcement learning algorithm. When training autonomous cyber agents, every component of the game, from the observation space to the reward and actions, must be appropriately defined for the agent to be usable after training.
Figure 3: The emulation environment consists of three main components: the Action Controller, Observation Converter and Emulator. The Action Controller receives actions from the RL Agent and transmits commands to the Emulator, while the Observation Converter converts the state of the emulator to an observation space vector to be fed back to the RL Agent. The Emulator manages virtual machines that represent hosts and other entities in a network. Any changes to the set of available actions for either offense or defense, the observation space, or the emulation environment require significant time investment, and maintaining all of the tools necessary to run each of the components is also labor intensive.

The Path To Autonomous Cyber Defense

TL;DR

Abstract

The Path To Autonomous Cyber Defense

Authors

TL;DR

Abstract

Table of Contents

Figures (3)