MAIN-RAG: Multi-Agent Filtering Retrieval-Augmented Generation

Chia-Yuan Chang; Zhimeng Jiang; Vineeth Rakesh; Menghai Pan; Chin-Chia Michael Yeh; Guanchu Wang; Mingzhi Hu; Zhichao Xu; Yan Zheng; Mahashweta Das; Na Zou

MAIN-RAG: Multi-Agent Filtering Retrieval-Augmented Generation

Chia-Yuan Chang, Zhimeng Jiang, Vineeth Rakesh, Menghai Pan, Chin-Chia Michael Yeh, Guanchu Wang, Mingzhi Hu, Zhichao Xu, Yan Zheng, Mahashweta Das, Na Zou

TL;DR

MAIN-RAG presents a training-free, multi-agent framework to mitigate noisy retrieval in retrieval-augmented generation. By employing three LLM agents—Predictor, Judge, and Final-Predictor—and an adaptive judge bar $\ au_q$ based on the score distribution, it achieves robust document filtering and ranking without fine-tuning. Empirical results across four QA benchmarks show 2–11% gains in answer accuracy over baselines without training, with reduced noise and improved consistency, particularly on open-domain QA tasks relying on external knowledge. The approach offers a scalable, plug-and-play improvement for RAG systems with strong practical impact on reliability and efficiency.

Abstract

Large Language Models (LLMs) are becoming essential tools for various natural language processing tasks but often suffer from generating outdated or incorrect information. Retrieval-Augmented Generation (RAG) addresses this issue by incorporating external, real-time information retrieval to ground LLM responses. However, the existing RAG systems frequently struggle with the quality of retrieval documents, as irrelevant or noisy documents degrade performance, increase computational overhead, and undermine response reliability. To tackle this problem, we propose Multi-Agent Filtering Retrieval-Augmented Generation (MAIN-RAG), a training-free RAG framework that leverages multiple LLM agents to collaboratively filter and score retrieved documents. Specifically, MAIN-RAG introduces an adaptive filtering mechanism that dynamically adjusts the relevance filtering threshold based on score distributions, effectively minimizing noise while maintaining high recall of relevant documents. The proposed approach leverages inter-agent consensus to ensure robust document selection without requiring additional training data or fine-tuning. Experimental results across four QA benchmarks demonstrate that MAIN-RAG consistently outperforms traditional RAG approaches, achieving a 2-11% improvement in answer accuracy while reducing the number of irrelevant retrieved documents. Quantitative analysis further reveals that our approach achieves superior response consistency and answer accuracy over baseline methods, offering a competitive and practical alternative to training-based solutions.

MAIN-RAG: Multi-Agent Filtering Retrieval-Augmented Generation

TL;DR

based on the score distribution, it achieves robust document filtering and ranking without fine-tuning. Empirical results across four QA benchmarks show 2–11% gains in answer accuracy over baselines without training, with reduced noise and improved consistency, particularly on open-domain QA tasks relying on external knowledge. The approach offers a scalable, plug-and-play improvement for RAG systems with strong practical impact on reliability and efficiency.

Abstract

Paper Structure (22 sections, 15 figures, 3 tables)

This paper contains 22 sections, 15 figures, 3 tables.

Introduction
Preliminaries
Notations and Objectives
Impact of Noisy Retrieval Documents
Related Works
Multi-Agent Filtering RAG (MAIN-RAG)
Definition of LLM Agents in MAIN-RAG
Relevance Judgment Quantification
Adaptive Judge Bar $\tau_q$
Experiments
Tasks and Datasets
Baselines
Experimental Settings
Quantitative Analysis (RQ1)
Ablation Studies on Adaptive Judge Bar $\tau_q$ for Filtering and Ranking (RQ2)
...and 7 more sections

Figures (15)

Figure 1: An overview of the proposed framework MAIN-RAG, consisting of three LLM agents to identify noisy retrieved documents for filtering (see Section \ref{['sec:define_agents']}). After the retrieval, Agent-1 "Predictor" infers answers for each query; then, Agent-2 "Judge" takes Doc-Q-A Triplet to judge if a document is supportive for LLMs to answer the query. "Judge" provides relevant scores for each document for filtering and ordering. Finally, Agent-3 "Final-Predictor" answers the query with the given document list.
Figure 2: Quantification of document relevant score.
Figure 3: Impacts of document ordering on variance in RAG performance, where Noise Docs $t/u$ means $t$ noisy documents out of $u$ retrieved documents.
Figure 4: Examples of Optimal Judge Bar (OJB).
Figure 5: Optimal judge bars for different noise ratios in different queries, where Noise Docs $t/u$ means $t$ noisy documents out of $u$ retrieved documents.
...and 10 more figures

MAIN-RAG: Multi-Agent Filtering Retrieval-Augmented Generation

TL;DR

Abstract

MAIN-RAG: Multi-Agent Filtering Retrieval-Augmented Generation

Authors

TL;DR

Abstract

Table of Contents

Figures (15)