Depending on yourself when you should: Mentoring LLM with RL agents to become the master in cybersecurity games
Yikuan Yan, Yaolun Zhang, Keman Huang
TL;DR
<3-5 sentence high-level summary>This work tackles the challenge of combining large language models with reinforcement learning agents to perform cybersecurity operations more effectively. It introduces SecurityBot, an LLM-based agent augmented with profile, memory, reflection, and action modules and supported by three collaboration mechanisms (cursor, aggregator, caller) to leverage pre-trained RL mentors. Across CybORG-based red-team and blue-team tasks, SecurityBot achieves complementary gains over standalone LLM or RL approaches, with single-mentor configurations often outperforming multi-mentor setups due to noise. The study highlights the practical potential of LLM-RL collaboration for autonomous cybersecurity and points to future work on fine-tuning models and developing more robust mentor aggregation strategies.
Abstract
Integrating LLM and reinforcement learning (RL) agent effectively to achieve complementary performance is critical in high stake tasks like cybersecurity operations. In this study, we introduce SecurityBot, a LLM agent mentored by pre-trained RL agents, to support cybersecurity operations. In particularly, the LLM agent is supported with a profile module to generated behavior guidelines, a memory module to accumulate local experiences, a reflection module to re-evaluate choices, and an action module to reduce action space. Additionally, it adopts the collaboration mechanism to take suggestions from pre-trained RL agents, including a cursor for dynamic suggestion taken, an aggregator for multiple mentors' suggestions ranking and a caller for proactive suggestion asking. Building on the CybORG experiment framework, our experiences show that SecurityBot demonstrates significant performance improvement compared with LLM or RL standalone, achieving the complementary performance in the cybersecurity games.
