MA-RAG: Multi-Agent Retrieval-Augmented Generation via Collaborative Chain-of-Thought Reasoning

Thang Nguyen; Peter Chin; Yu-Wing Tai

MA-RAG: Multi-Agent Retrieval-Augmented Generation via Collaborative Chain-of-Thought Reasoning

Thang Nguyen, Peter Chin, Yu-Wing Tai

TL;DR

This paper addresses the challenges of retrieval-augmented generation (RAG) in complex, ambiguous, and multi-hop information-seeking tasks by introducing MA-RAG, a training-free, modular multi-agent framework. MA-RAG delegates distinct subtasks to specialized agents—Planner, Step Definer, Extractor, and QA—who collaborate via chain-of-thought prompting to perform stepwise, interpretable reasoning and on-demand retrieval. The approach yields strong open-domain QA performance, robust generalization to medical-domain QA without domain-specific fine-tuning, and clear ablation evidence that planning and extraction are critical for multi-hop reasoning. By enabling fine-grained control over information flow and leveraging modular reasoning, MA-RAG demonstrates a scalable, interpretable path to improved grounding and reliability in retrieval-augmented systems.

Abstract

We present MA-RAG, a Multi-Agent framework for Retrieval-Augmented Generation (RAG) that addresses the inherent ambiguities and reasoning challenges in complex information-seeking tasks. Unlike conventional RAG methods that rely on end-to-end fine-tuning or isolated component enhancements, MA-RAG orchestrates a collaborative set of specialized AI agents: Planner, Step Definer, Extractor, and QA Agents, each responsible for a distinct stage of the RAG pipeline. By decomposing tasks into subtasks such as query disambiguation, evidence extraction, and answer synthesis, and enabling agents to communicate intermediate reasoning via chain-of-thought prompting, MA-RAG progressively refines retrieval and synthesis while maintaining modular interpretability. Extensive experiments on multi-hop and ambiguous QA benchmarks, including NQ, HotpotQA, 2WikimQA, and TriviaQA, demonstrate that MA-RAG significantly outperforms standalone LLMs and existing RAG methods across all model scales. Notably, even a small LLaMA3-8B model equipped with MA-RAG surpasses larger standalone LLMs, while larger variants (LLaMA3-70B and GPT-4o-mini) set new state-of-the-art results on challenging multi-hop datasets. Ablation studies reveal that both the planner and extractor agents are critical for multi-hop reasoning, and that high-capacity models are especially important for the QA agent to synthesize answers effectively. Beyond general-domain QA, MA-RAG generalizes to specialized domains such as medical QA, achieving competitive performance against domain-specific models without any domain-specific fine-tuning. Our results highlight the effectiveness of collaborative, modular reasoning in retrieval-augmented systems: MA-RAG not only improves answer accuracy and robustness but also provides interpretable intermediate reasoning steps, establishing a new paradigm for efficient and reliable multi-agent RAG.

MA-RAG: Multi-Agent Retrieval-Augmented Generation via Collaborative Chain-of-Thought Reasoning

TL;DR

Abstract

MA-RAG: Multi-Agent Retrieval-Augmented Generation via Collaborative Chain-of-Thought Reasoning

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (4)