Table of Contents
Fetching ...

Towards a Generalisable Cyber Defence Agent for Real-World Computer Networks

Tim Dudman, Martyn Bull

TL;DR

TERLA introduces topological extensions for reinforcement learning in cyber defence, using heterogeneous graph transformers to produce fixed-size latent embeddings of observed networks and a reduced, interpretable action space. This enables PPO-based agents to generalise across networks with different topology and size without retraining, while maintaining defensive performance and improving action efficiency. Evaluation in the CC4 environment demonstrates comparable protection to vanilla PPO and highlights the benefits of a single TERLA agent deployed across multiple network segments. The work outlines practical steps toward real-world applicability, including action-waiting and reward shaping to address CC4 dynamics and recommendations for future maturation across larger action spaces and human–AI teaming.

Abstract

Recent advances in deep reinforcement learning for autonomous cyber defence have resulted in agents that can successfully defend simulated computer networks against cyber-attacks. However, many of these agents would need retraining to defend networks with differing topology or size, making them poorly suited to real-world networks where topology and size can vary over time. In this research we introduce a novel set of Topological Extensions for Reinforcement Learning Agents (TERLA) that provide generalisability for the defence of networks with differing topology and size, without the need for retraining. Our approach involves the use of heterogeneous graph neural network layers to produce a fixed-size latent embedding representing the observed network state. This representation learning stage is coupled with a reduced, fixed-size, semantically meaningful and interpretable action space. We apply TERLA to a standard deep reinforcement learning Proximal Policy Optimisation (PPO) agent model, and to reduce the sim-to-real gap, conduct our research using Cyber Autonomy Gym for Experimentation (CAGE) Challenge 4. This Cyber Operations Research Gym environment has many of the features of a real-world network, such as realistic Intrusion Detection System (IDS) events and multiple agents defending network segments of differing topology and size. TERLA agents retain the defensive performance of vanilla PPO agents whilst showing improved action efficiency. Generalisability has been demonstrated by showing that all TERLA agents have the same network-agnostic neural network architecture, and by deploying a single TERLA agent multiple times to defend network segments with differing topology and size, showing improved defensive performance and efficiency.

Towards a Generalisable Cyber Defence Agent for Real-World Computer Networks

TL;DR

TERLA introduces topological extensions for reinforcement learning in cyber defence, using heterogeneous graph transformers to produce fixed-size latent embeddings of observed networks and a reduced, interpretable action space. This enables PPO-based agents to generalise across networks with different topology and size without retraining, while maintaining defensive performance and improving action efficiency. Evaluation in the CC4 environment demonstrates comparable protection to vanilla PPO and highlights the benefits of a single TERLA agent deployed across multiple network segments. The work outlines practical steps toward real-world applicability, including action-waiting and reward shaping to address CC4 dynamics and recommendations for future maturation across larger action spaces and human–AI teaming.

Abstract

Recent advances in deep reinforcement learning for autonomous cyber defence have resulted in agents that can successfully defend simulated computer networks against cyber-attacks. However, many of these agents would need retraining to defend networks with differing topology or size, making them poorly suited to real-world networks where topology and size can vary over time. In this research we introduce a novel set of Topological Extensions for Reinforcement Learning Agents (TERLA) that provide generalisability for the defence of networks with differing topology and size, without the need for retraining. Our approach involves the use of heterogeneous graph neural network layers to produce a fixed-size latent embedding representing the observed network state. This representation learning stage is coupled with a reduced, fixed-size, semantically meaningful and interpretable action space. We apply TERLA to a standard deep reinforcement learning Proximal Policy Optimisation (PPO) agent model, and to reduce the sim-to-real gap, conduct our research using Cyber Autonomy Gym for Experimentation (CAGE) Challenge 4. This Cyber Operations Research Gym environment has many of the features of a real-world network, such as realistic Intrusion Detection System (IDS) events and multiple agents defending network segments of differing topology and size. TERLA agents retain the defensive performance of vanilla PPO agents whilst showing improved action efficiency. Generalisability has been demonstrated by showing that all TERLA agents have the same network-agnostic neural network architecture, and by deploying a single TERLA agent multiple times to defend network segments with differing topology and size, showing improved defensive performance and efficiency.

Paper Structure

This paper contains 14 sections, 2 equations, 11 figures, 5 tables.

Figures (11)

  • Figure 1: An example of a DRL agent interacting with a cyber environment, exchanging observations, actions and rewards.
  • Figure 2: An example of a typical DRL agent model, sized to the observation and action spaces of the environment in which it is trained.
  • Figure 3: The CC4 network laydown (© 2024 TTCP) showing the varying topology and size of the five network segments being defended.
  • Figure 4: The TERLA architecture applied to a standard PPO model.
  • Figure 5: The TERLA heterogeneous graph (schema on the left and an example instance as seen by an agent on the right).
  • ...and 6 more figures