Table of Contents
Fetching ...

AutoBnB-RAG: Enhancing Multi-Agent Incident Response with Retrieval-Augmented Generation

Zefang Liu, Arman Anwar

TL;DR

AutoBnB-RAG integrates retrieval-augmented generation into a multi-agent incident response simulation built on Backdoors & Breaches. By adding a retrieval agent and two knowledge sources (RAG-Wiki and RAG-News), the framework surfaces external evidence after failed reasoning steps, improving decision quality and success rates across eight team structures, including argumentative configurations. Real-world breach simulations show AutoBnB-RAG can reconstruct complex multi-stage attacks using retrieved content, underscoring the value of grounding AI-driven IR in both curated documentation and narrative evidence. The work demonstrates the practical potential of combining structured collaborative reasoning with targeted knowledge access to enhance cyber defense capabilities.

Abstract

Incident response (IR) requires fast, coordinated, and well-informed decision-making to contain and mitigate cyber threats. While large language models (LLMs) have shown promise as autonomous agents in simulated IR settings, their reasoning is often limited by a lack of access to external knowledge. In this work, we present AutoBnB-RAG, an extension of the AutoBnB framework that incorporates retrieval-augmented generation (RAG) into multi-agent incident response simulations. Built on the Backdoors & Breaches (B&B) tabletop game environment, AutoBnB-RAG enables agents to issue retrieval queries and incorporate external evidence during collaborative investigations. We introduce two retrieval settings: one grounded in curated technical documentation (RAG-Wiki), and another using narrative-style incident reports (RAG-News). We evaluate performance across eight team structures, including newly introduced argumentative configurations designed to promote critical reasoning. To validate practical utility, we also simulate real-world cyber incidents based on public breach reports, demonstrating AutoBnB-RAG's ability to reconstruct complex multi-stage attacks. Our results show that retrieval augmentation improves decision quality and success rates across diverse organizational models. This work demonstrates the value of integrating retrieval mechanisms into LLM-based multi-agent systems for cybersecurity decision-making.

AutoBnB-RAG: Enhancing Multi-Agent Incident Response with Retrieval-Augmented Generation

TL;DR

AutoBnB-RAG integrates retrieval-augmented generation into a multi-agent incident response simulation built on Backdoors & Breaches. By adding a retrieval agent and two knowledge sources (RAG-Wiki and RAG-News), the framework surfaces external evidence after failed reasoning steps, improving decision quality and success rates across eight team structures, including argumentative configurations. Real-world breach simulations show AutoBnB-RAG can reconstruct complex multi-stage attacks using retrieved content, underscoring the value of grounding AI-driven IR in both curated documentation and narrative evidence. The work demonstrates the practical potential of combining structured collaborative reasoning with targeted knowledge access to enhance cyber defense capabilities.

Abstract

Incident response (IR) requires fast, coordinated, and well-informed decision-making to contain and mitigate cyber threats. While large language models (LLMs) have shown promise as autonomous agents in simulated IR settings, their reasoning is often limited by a lack of access to external knowledge. In this work, we present AutoBnB-RAG, an extension of the AutoBnB framework that incorporates retrieval-augmented generation (RAG) into multi-agent incident response simulations. Built on the Backdoors & Breaches (B&B) tabletop game environment, AutoBnB-RAG enables agents to issue retrieval queries and incorporate external evidence during collaborative investigations. We introduce two retrieval settings: one grounded in curated technical documentation (RAG-Wiki), and another using narrative-style incident reports (RAG-News). We evaluate performance across eight team structures, including newly introduced argumentative configurations designed to promote critical reasoning. To validate practical utility, we also simulate real-world cyber incidents based on public breach reports, demonstrating AutoBnB-RAG's ability to reconstruct complex multi-stage attacks. Our results show that retrieval augmentation improves decision quality and success rates across diverse organizational models. This work demonstrates the value of integrating retrieval mechanisms into LLM-based multi-agent systems for cybersecurity decision-making.

Paper Structure

This paper contains 30 sections, 3 figures, 8 tables.

Figures (3)

  • Figure 1: Team structures evaluated in LLM-driven incident response simulations using the Backdoors & Breaches framework.
  • Figure 2: Examples of Backdoors & Breaches cards used in this study. Image source: Black Hills Information Security.
  • Figure 3: Gameplay flow of AutoBnB-RAG, illustrating the interaction loop between defenders, retrieval, and success conditions.