Depending on yourself when you should: Mentoring LLM with RL agents to become the master in cybersecurity games

Yikuan Yan; Yaolun Zhang; Keman Huang

Depending on yourself when you should: Mentoring LLM with RL agents to become the master in cybersecurity games

Yikuan Yan, Yaolun Zhang, Keman Huang

TL;DR

<3-5 sentence high-level summary>This work tackles the challenge of combining large language models with reinforcement learning agents to perform cybersecurity operations more effectively. It introduces SecurityBot, an LLM-based agent augmented with profile, memory, reflection, and action modules and supported by three collaboration mechanisms (cursor, aggregator, caller) to leverage pre-trained RL mentors. Across CybORG-based red-team and blue-team tasks, SecurityBot achieves complementary gains over standalone LLM or RL approaches, with single-mentor configurations often outperforming multi-mentor setups due to noise. The study highlights the practical potential of LLM-RL collaboration for autonomous cybersecurity and points to future work on fine-tuning models and developing more robust mentor aggregation strategies.

Abstract

Integrating LLM and reinforcement learning (RL) agent effectively to achieve complementary performance is critical in high stake tasks like cybersecurity operations. In this study, we introduce SecurityBot, a LLM agent mentored by pre-trained RL agents, to support cybersecurity operations. In particularly, the LLM agent is supported with a profile module to generated behavior guidelines, a memory module to accumulate local experiences, a reflection module to re-evaluate choices, and an action module to reduce action space. Additionally, it adopts the collaboration mechanism to take suggestions from pre-trained RL agents, including a cursor for dynamic suggestion taken, an aggregator for multiple mentors' suggestions ranking and a caller for proactive suggestion asking. Building on the CybORG experiment framework, our experiences show that SecurityBot demonstrates significant performance improvement compared with LLM or RL standalone, achieving the complementary performance in the cybersecurity games.

Depending on yourself when you should: Mentoring LLM with RL agents to become the master in cybersecurity games

TL;DR

Abstract

Paper Structure (33 sections, 4 equations, 8 figures, 3 tables)

This paper contains 33 sections, 4 equations, 8 figures, 3 tables.

Introduction
Related Work
LLMs for cybersecurity operations
LLM to enhance cybersecurity
LLMs' double-edged sword role for cybe security
Collaboration mechanisms to improve LLMs
Role-based multi-LLM-agent collaboration
Dual-process-based LLM-RL collaboration
LLM setting guidance to support RL
RL acting as expert to guide LLM's decision
Cybersecurity Adversarial Game and Pre-trained RL Agents
Cybersecurity Adversarial Games
Pre-trained RL Agents
SecurityBot: an LLM-based agent mentored by RL agents
LLM Agent Design
...and 18 more sections

Figures (8)

Figure 1: A POMDP cybersecurity adversial game. The red host in User Subnet represents the foot node of the red team. The blue host in Enterprise Subnet represents the defender host of the blue team.
Figure 2: Action-Status Transition. Red text represents red team actions, blue text represents blue team actions.
Figure 3: The Framework of SecurityBot: LLM-based RLs-mentoring Agent for Cybersecurity Operation
Figure 4: The illustration of profile module, including the example of roles, goals, actions, environment format and the generated behavior guidance (the bottom part) as well as the process to generate the behavior guidance (the upper part).
Figure 5: The prompt for Red Agent from the reflection module to motivate the LLM to choose other attack actions.
...and 3 more figures

Depending on yourself when you should: Mentoring LLM with RL agents to become the master in cybersecurity games

TL;DR

Abstract

Depending on yourself when you should: Mentoring LLM with RL agents to become the master in cybersecurity games

Authors

TL;DR

Abstract

Table of Contents

Figures (8)