Table of Contents
Fetching ...

Depending on yourself when you should: Mentoring LLM with RL agents to become the master in cybersecurity games

Yikuan Yan, Yaolun Zhang, Keman Huang

TL;DR

<3-5 sentence high-level summary>This work tackles the challenge of combining large language models with reinforcement learning agents to perform cybersecurity operations more effectively. It introduces SecurityBot, an LLM-based agent augmented with profile, memory, reflection, and action modules and supported by three collaboration mechanisms (cursor, aggregator, caller) to leverage pre-trained RL mentors. Across CybORG-based red-team and blue-team tasks, SecurityBot achieves complementary gains over standalone LLM or RL approaches, with single-mentor configurations often outperforming multi-mentor setups due to noise. The study highlights the practical potential of LLM-RL collaboration for autonomous cybersecurity and points to future work on fine-tuning models and developing more robust mentor aggregation strategies.

Abstract

Integrating LLM and reinforcement learning (RL) agent effectively to achieve complementary performance is critical in high stake tasks like cybersecurity operations. In this study, we introduce SecurityBot, a LLM agent mentored by pre-trained RL agents, to support cybersecurity operations. In particularly, the LLM agent is supported with a profile module to generated behavior guidelines, a memory module to accumulate local experiences, a reflection module to re-evaluate choices, and an action module to reduce action space. Additionally, it adopts the collaboration mechanism to take suggestions from pre-trained RL agents, including a cursor for dynamic suggestion taken, an aggregator for multiple mentors' suggestions ranking and a caller for proactive suggestion asking. Building on the CybORG experiment framework, our experiences show that SecurityBot demonstrates significant performance improvement compared with LLM or RL standalone, achieving the complementary performance in the cybersecurity games.

Depending on yourself when you should: Mentoring LLM with RL agents to become the master in cybersecurity games

TL;DR

<3-5 sentence high-level summary>This work tackles the challenge of combining large language models with reinforcement learning agents to perform cybersecurity operations more effectively. It introduces SecurityBot, an LLM-based agent augmented with profile, memory, reflection, and action modules and supported by three collaboration mechanisms (cursor, aggregator, caller) to leverage pre-trained RL mentors. Across CybORG-based red-team and blue-team tasks, SecurityBot achieves complementary gains over standalone LLM or RL approaches, with single-mentor configurations often outperforming multi-mentor setups due to noise. The study highlights the practical potential of LLM-RL collaboration for autonomous cybersecurity and points to future work on fine-tuning models and developing more robust mentor aggregation strategies.

Abstract

Integrating LLM and reinforcement learning (RL) agent effectively to achieve complementary performance is critical in high stake tasks like cybersecurity operations. In this study, we introduce SecurityBot, a LLM agent mentored by pre-trained RL agents, to support cybersecurity operations. In particularly, the LLM agent is supported with a profile module to generated behavior guidelines, a memory module to accumulate local experiences, a reflection module to re-evaluate choices, and an action module to reduce action space. Additionally, it adopts the collaboration mechanism to take suggestions from pre-trained RL agents, including a cursor for dynamic suggestion taken, an aggregator for multiple mentors' suggestions ranking and a caller for proactive suggestion asking. Building on the CybORG experiment framework, our experiences show that SecurityBot demonstrates significant performance improvement compared with LLM or RL standalone, achieving the complementary performance in the cybersecurity games.
Paper Structure (33 sections, 4 equations, 8 figures, 3 tables)

This paper contains 33 sections, 4 equations, 8 figures, 3 tables.

Figures (8)

  • Figure 1: A POMDP cybersecurity adversial game. The red host in User Subnet represents the foot node of the red team. The blue host in Enterprise Subnet represents the defender host of the blue team.
  • Figure 2: Action-Status Transition. Red text represents red team actions, blue text represents blue team actions.
  • Figure 3: The Framework of SecurityBot: LLM-based RLs-mentoring Agent for Cybersecurity Operation
  • Figure 4: The illustration of profile module, including the example of roles, goals, actions, environment format and the generated behavior guidance (the bottom part) as well as the process to generate the behavior guidance (the upper part).
  • Figure 5: The prompt for Red Agent from the reflection module to motivate the LLM to choose other attack actions.
  • ...and 3 more figures