Structural Generalization in Autonomous Cyber Incident Response with Message-Passing Neural Networks and Reinforcement Learning

Jakob Nyberg; Pontus Johnson

Structural Generalization in Autonomous Cyber Incident Response with Message-Passing Neural Networks and Reinforcement Learning

Jakob Nyberg, Pontus Johnson

TL;DR

The paper tackles automated cyber incident response under dynamic network structure by using a Symbolic Relational Deep Reinforcement Learning framework with a message-passing neural network to encode relational graph states. It reshapes CAGE 2 observations into graphs, applies a two-step policy over nodes and actions, and trains end-to-end with PPO, exploring local versus global graph representations. Results show zero-shot generalization to unseen network variants, with local mpnn configurations performing best among generalized agents, though specially trained mlp policies can outperform on known variants. The work demonstrates a practical trade-off between generalization and specialization and argues that relational structure can yield reusable agents for structurally varied networks, reducing retraining costs in real-world deployments.

Abstract

We believe that agents for automated incident response based on machine learning need to handle changes in network structure. Computer networks are dynamic, and can naturally change in structure over time. Retraining agents for small network changes costs time and energy. We attempt to address this issue with an existing method of relational agent learning, where the relations between objects are assumed to remain consistent across problem instances. The state of the computer network is represented as a relational graph and encoded through a message passing neural network. The message passing neural network and an agent policy using the encoding are optimized end-to-end using reinforcement learning. We evaluate the approach on the second instance of the Cyber Autonomy Gym for Experimentation (CAGE~2), a cyber incident simulator that simulates attacks on an enterprise network. We create variants of the original network with different numbers of hosts and agents are tested without additional training on them. Our results show that agents using relational information are able to find solutions despite changes to the network, and can perform optimally in some instances. Agents using the default vector state representation perform better, but need to be specially trained on each network variant, demonstrating a trade-off between specialization and generalization.

Structural Generalization in Autonomous Cyber Incident Response with Message-Passing Neural Networks and Reinforcement Learning

TL;DR

Abstract

Paper Structure (17 sections, 10 equations, 5 figures, 2 tables)

This paper contains 17 sections, 10 equations, 5 figures, 2 tables.

Introduction
Background
CAGE and CybORG
Learning from Graphs
Reinforcement Learning
Symbolic Relational Deep Reinforcement Learning
Implementation of MPNN Agents on CAGE 2
Local Message-Passing Scheme
Policy Decomposition
Changes to CAGE 2
Evaluation
Evaluation Results
Related Work
Discussion
Limits of Relational Learning
...and 2 more sections

Figures (5)

Figure 1: Image of computer network simulated in CAGE 2 presented on the developers GitHub repository. During a match, the red team starts from an user host and attempts to reach the operational server. This can be achieved by first compromising one of the enterprise servers, which can communicate with the operational machines.
Figure 2: Graph representation of network structure used in CAGE 2.
Figure 3: Schematic showing the computation paths for action probabilities and state value on a graph with two node and two edges.
Figure 4: A state graph at a given point of the cage 2 simulation. Each node has two categorical attributes encoded to binary using two bits each. The first attribute indicates the observed activity, and the second the access privilege of the red team agent on that host.
Figure 5: Bar chart showing rewards for trained and untrained agents averaged over 1000 episodes. MLP agents were specially trained on each network variant. MPNN agents were only trained on the network variant with 13 hosts, and evaluated on others. Agents were trained and evaluated against the Meander red team policy, with episode lengths of 50 steps.

Structural Generalization in Autonomous Cyber Incident Response with Message-Passing Neural Networks and Reinforcement Learning

TL;DR

Abstract

Structural Generalization in Autonomous Cyber Incident Response with Message-Passing Neural Networks and Reinforcement Learning

Authors

TL;DR

Abstract

Table of Contents

Figures (5)