Table of Contents
Fetching ...

Evaluating Generalization Mechanisms in Autonomous Cyber Attack Agents

Ondřej Lukáš, Jihoon Shin, Emilia Rivas, Diego Forni, Maria Rigaki, Carlos Catania, Aritran Piplai, Christopher Kiekintveld, Sebastian Garcia

TL;DR

Under the evaluation protocol and agent-specific assumptions, prompt-driven pretrained LLM agents achieve the highest success on the held-out reassignment, but at the cost of increased inference-time compute, reduced transparency, and practical failure modes such as repetition/invalid-action loops.

Abstract

Autonomous offensive agents often fail to transfer beyond the networks on which they are trained. We isolate a minimal but fundamental shift -- unseen host/subnet IP reassignment in an otherwise fixed enterprise scenario -- and evaluate attacker generalization in the NetSecGame environment. Agents are trained on five IP-range variants and tested on a sixth unseen variant; only the meta-learning agent may adapt at test time. We compare three agent families (traditional RL, adaptation agents, and LLM-based agents) and use action-distribution-based behavioral/XAI analyses to localize failure modes. Some adaptation methods show partial transfer but significant degradation under unseen reassignment, indicating that even address-space changes can break long-horizon attack policies. Under our evaluation protocol and agent-specific assumptions, prompt-driven pretrained LLM agents achieve the highest success on the held-out reassignment, but at the cost of increased inference-time compute, reduced transparency, and practical failure modes such as repetition/invalid-action loops.

Evaluating Generalization Mechanisms in Autonomous Cyber Attack Agents

TL;DR

Under the evaluation protocol and agent-specific assumptions, prompt-driven pretrained LLM agents achieve the highest success on the held-out reassignment, but at the cost of increased inference-time compute, reduced transparency, and practical failure modes such as repetition/invalid-action loops.

Abstract

Autonomous offensive agents often fail to transfer beyond the networks on which they are trained. We isolate a minimal but fundamental shift -- unseen host/subnet IP reassignment in an otherwise fixed enterprise scenario -- and evaluate attacker generalization in the NetSecGame environment. Agents are trained on five IP-range variants and tested on a sixth unseen variant; only the meta-learning agent may adapt at test time. We compare three agent families (traditional RL, adaptation agents, and LLM-based agents) and use action-distribution-based behavioral/XAI analyses to localize failure modes. Some adaptation methods show partial transfer but significant degradation under unseen reassignment, indicating that even address-space changes can break long-horizon attack policies. Under our evaluation protocol and agent-specific assumptions, prompt-driven pretrained LLM agents achieve the highest success on the held-out reassignment, but at the cost of increased inference-time compute, reduced transparency, and practical failure modes such as repetition/invalid-action loops.
Paper Structure (115 sections, 16 equations, 19 figures, 15 tables)

This paper contains 115 sections, 16 equations, 19 figures, 15 tables.

Figures (19)

  • Figure 1: DARLA higgins2017darla intuition for generalization: A robot trained in one visual domain (e.g., a red room with a red fruit) can fail under appearance shifts (e.g., a blue room with an orange fruit) if it overfits to surface cues.
  • Figure 2: The network topology used in the Data exfiltration scenario. There are two local sub-networks, each consisting of 5 hosts. Hosts are reachable from each other within the same sub-networks. Access from the other sub-network is determined by firewall rules (shown as dotted lines).
  • Figure 3: DDQN agent architecture: JSON observations are embedded, mapped to a Q-value vector over the full action set, and the greedy action is selected after masking to currently valid actions (with a delayed target network for stability).
  • Figure 4: Dataset distribution across topologies. Win values represent successful episodes, Progress actions are state changing valid actions, and Winning Progress actions are progress actions from successful episodes.
  • Figure 5: Diagram of Operation of the Conceptual Agent. The translations from state to concept-state and from concept-action to action are done as a wrapper. The agent operates on the new state without information about the original state.
  • ...and 14 more figures