Table of Contents
Fetching ...

Separate the Wheat from the Chaff: Winnowing Down Divergent Views in Retrieval Augmented Generation

Song Wang, Zihan Chen, Peng Wang, Zhepei Wei, Zhen Tan, Yu Meng, Cong Shen, Jundong Li

TL;DR

WinnowRAG tackles the noise problem in retrieval-augmented generation by a two-stage, training-free framework that first clusters retrieved documents by query relevance and then uses a critic-guided, multi-agent winnowing process to merge and filter content. The method assigns cluster-specific LLM agents in Stage I and employs embedding-space Ellipse and Hyperbola merging in Stage II to retain useful documents while discarding noise, all without task-specific fine-tuning. Empirical results across knowledge-intensive benchmarks show WinnowRAG consistently outperforms training-free baselines, and ablations confirm the critical roles of clustering, merging, and iterative critique. The approach is model-agnostic, scalable, and applicable to diverse domains, offering a practical path to more reliable RAG systems without costly fine-tuning.

Abstract

Retrieval-augmented generation (RAG) enhances large language models (LLMs) by integrating external knowledge sources to address their limitations in accessing up-to-date or specialized information. A natural strategy to increase the likelihood of retrieving relevant information is to expand the number of retrieved documents. However, involving more documents could introduce significant noise, as many documents may be irrelevant or misleading, thereby reducing the overall accuracy of the generated responses. To overcome the challenge associated with handling a larger number of documents, we propose WinnowRAG, a novel RAG framework designed to systematically filter out noisy documents while preserving valuable content -- a process we refer to as winnowing. WinnowRAG operates in two stages: In Stage I, we perform query-aware clustering to group similar documents and form distinct topic clusters. Each cluster is assigned to an LLM agent for generating a unique answer. In Stage II, we perform winnowing, wherein a critic LLM evaluates the outputs of multiple agents and iteratively separates useful documents from noisy ones. To retain useful documents when discarding agents, we propose two strategic merging techniques to ensure that only relevant knowledge is used for generating the final response. Crucially, WinnowRAG is model-agnostic and does not require any model fine-tuning, making it easily adaptable to various tasks. Extensive experiments on various realistic datasets demonstrate the effectiveness of WinnowRAG over state-of-the-art baselines.

Separate the Wheat from the Chaff: Winnowing Down Divergent Views in Retrieval Augmented Generation

TL;DR

WinnowRAG tackles the noise problem in retrieval-augmented generation by a two-stage, training-free framework that first clusters retrieved documents by query relevance and then uses a critic-guided, multi-agent winnowing process to merge and filter content. The method assigns cluster-specific LLM agents in Stage I and employs embedding-space Ellipse and Hyperbola merging in Stage II to retain useful documents while discarding noise, all without task-specific fine-tuning. Empirical results across knowledge-intensive benchmarks show WinnowRAG consistently outperforms training-free baselines, and ablations confirm the critical roles of clustering, merging, and iterative critique. The approach is model-agnostic, scalable, and applicable to diverse domains, offering a practical path to more reliable RAG systems without costly fine-tuning.

Abstract

Retrieval-augmented generation (RAG) enhances large language models (LLMs) by integrating external knowledge sources to address their limitations in accessing up-to-date or specialized information. A natural strategy to increase the likelihood of retrieving relevant information is to expand the number of retrieved documents. However, involving more documents could introduce significant noise, as many documents may be irrelevant or misleading, thereby reducing the overall accuracy of the generated responses. To overcome the challenge associated with handling a larger number of documents, we propose WinnowRAG, a novel RAG framework designed to systematically filter out noisy documents while preserving valuable content -- a process we refer to as winnowing. WinnowRAG operates in two stages: In Stage I, we perform query-aware clustering to group similar documents and form distinct topic clusters. Each cluster is assigned to an LLM agent for generating a unique answer. In Stage II, we perform winnowing, wherein a critic LLM evaluates the outputs of multiple agents and iteratively separates useful documents from noisy ones. To retain useful documents when discarding agents, we propose two strategic merging techniques to ensure that only relevant knowledge is used for generating the final response. Crucially, WinnowRAG is model-agnostic and does not require any model fine-tuning, making it easily adaptable to various tasks. Extensive experiments on various realistic datasets demonstrate the effectiveness of WinnowRAG over state-of-the-art baselines.

Paper Structure

This paper contains 33 sections, 5 equations, 5 figures, 6 tables.

Figures (5)

  • Figure 1: The accuracy results of the recall (i.e., upper bound), direct input, and WinnowRAG on the NaturalQ kwiatkowski2019natural dataset with different numbers of retrieved documents.
  • Figure 2: The overall process of our WinnowRAG framework. We first perform query-aware clustering to group documents with similar semantic meanings with respect to the query. In Stage II, we first perform agent initialization to form multiple super-agents that will be used in the following winnowing steps. During multi-agent winnowing, we gradually discard agents with incorrect answers, guided by the critic LLM, while retaining useful documents.
  • Figure 3: The ablation study results of WinnowRAG on five datasets.
  • Figure 4: The accuracy improvement (over using one retrieved document) results of WinnowRAG with different numbers of retrieved documents.
  • Figure 5: The results of WinnowRAG on dataset NaturalQ with varying numbers of query-aware clusters and retrieved documents.