Table of Contents
Fetching ...

Towards Production-Worthy Simulation for Autonomous Cyber Operations

Konur Tholl, Mariam El Mezouar, Adrian Taylor, Ranwa Al Mallah

TL;DR

This study demonstrates that CybORG can be extended with additional realistic functionality, while maintaining its ability to generate informative training signals for RL agents.

Abstract

Simulated environments have proven invaluable in Autonomous Cyber Operations (ACO) where Reinforcement Learning (RL) agents can be trained without the computational overhead of emulation. These environments must accurately represent cybersecurity scenarios while producing the necessary signals to support RL training. In this study, we present a framework where we first extend CybORG's Cage Challenge 2 environment by implementing three new actions: Patch, Isolate, and Unisolate, to better represent the capabilities available to human operators in real-world settings. We then propose a design for agent development where we modify the reward signals and the agent's feature space to enhance training performance. To validate these modifications, we train DQN and PPO agents in the updated environment. Our study demonstrates that CybORG can be extended with additional realistic functionality, while maintaining its ability to generate informative training signals for RL agents.

Towards Production-Worthy Simulation for Autonomous Cyber Operations

TL;DR

This study demonstrates that CybORG can be extended with additional realistic functionality, while maintaining its ability to generate informative training signals for RL agents.

Abstract

Simulated environments have proven invaluable in Autonomous Cyber Operations (ACO) where Reinforcement Learning (RL) agents can be trained without the computational overhead of emulation. These environments must accurately represent cybersecurity scenarios while producing the necessary signals to support RL training. In this study, we present a framework where we first extend CybORG's Cage Challenge 2 environment by implementing three new actions: Patch, Isolate, and Unisolate, to better represent the capabilities available to human operators in real-world settings. We then propose a design for agent development where we modify the reward signals and the agent's feature space to enhance training performance. To validate these modifications, we train DQN and PPO agents in the updated environment. Our study demonstrates that CybORG can be extended with additional realistic functionality, while maintaining its ability to generate informative training signals for RL agents.

Paper Structure

This paper contains 22 sections, 4 equations, 7 figures, 3 tables.

Figures (7)

  • Figure 1: Diagram illustrating the functionality of the Patch action. The blue agent patches HostB (left), increasing its patch score. The red agent attempts to exploit HostB (middle), failing the exploit, but decreasing the patch score. The red agent attempts to exploit HostB again, this time successfully establishing a session.
  • Figure 2: Illustration of the Isolate action's functionality. The blue agent isolates HostA (left), disconnecting it from the network. The red agent attempts to exploit HostB from HostA (middle), but cannot connect since it is isolated. The blue agent unisolates HostA (right), but the red agent's session remains, as it was not explicitly removed.
  • Figure 3: Modified feature space mapping used in this study.
  • Figure 4: The DQN implementation used for this research. The data collection process is shown at the top, where the agent interacts with the environment to gather training samples. The training process is shown at the bottom, where the agent uses the collected data to compute the MSE loss between its prediction and the actual returns. In reality, the training is done in batches; however, this is omitted from the diagram for readability.
  • Figure 5: The training process for the PPO implementation. This illustrates how the critic loss and actor loss are computed using the sampled data.
  • ...and 2 more figures