Table of Contents
Fetching ...

Removal of Hallucination on Hallucination: Debate-Augmented RAG

Wentao Hu, Wengyu Zhang, Yiyang Jiang, Chen Jason Zhang, Xiaoyong Wei, Qing Li

TL;DR

This work tackles Hallucination on Hallucination in Retrieval-Augmented Generation by introducing Debate-Augmented RAG (DRAG), a training-free framework that integrates Multi-Agent Debate (MAD) into both retrieval and generation. DRAG uses a Retrieval Debate to iteratively refine the query pool and retrieved evidence, and a Response Debate with asymmetric agent roles to strengthen reasoning and factual verification, producing more reliable outputs. Extensive experiments across six datasets show DRAG improves retrieval adequacy, reduces RAG-induced hallucinations, and enhances multi-hop reasoning, albeit with higher computational overhead and potential problem drift in the response stage. The approach highlights the value of structured, adversarial debates to improve factual accuracy in knowledge-intensive NLP tasks, with practical implications for more trustworthy open-domain QA systems.

Abstract

Retrieval-Augmented Generation (RAG) enhances factual accuracy by integrating external knowledge, yet it introduces a critical issue: erroneous or biased retrieval can mislead generation, compounding hallucinations, a phenomenon we term Hallucination on Hallucination. To address this, we propose Debate-Augmented RAG (DRAG), a training-free framework that integrates Multi-Agent Debate (MAD) mechanisms into both retrieval and generation stages. In retrieval, DRAG employs structured debates among proponents, opponents, and judges to refine retrieval quality and ensure factual reliability. In generation, DRAG introduces asymmetric information roles and adversarial debates, enhancing reasoning robustness and mitigating factual inconsistencies. Evaluations across multiple tasks demonstrate that DRAG improves retrieval reliability, reduces RAG-induced hallucinations, and significantly enhances overall factual accuracy. Our code is available at https://github.com/Huenao/Debate-Augmented-RAG.

Removal of Hallucination on Hallucination: Debate-Augmented RAG

TL;DR

This work tackles Hallucination on Hallucination in Retrieval-Augmented Generation by introducing Debate-Augmented RAG (DRAG), a training-free framework that integrates Multi-Agent Debate (MAD) into both retrieval and generation. DRAG uses a Retrieval Debate to iteratively refine the query pool and retrieved evidence, and a Response Debate with asymmetric agent roles to strengthen reasoning and factual verification, producing more reliable outputs. Extensive experiments across six datasets show DRAG improves retrieval adequacy, reduces RAG-induced hallucinations, and enhances multi-hop reasoning, albeit with higher computational overhead and potential problem drift in the response stage. The approach highlights the value of structured, adversarial debates to improve factual accuracy in knowledge-intensive NLP tasks, with practical implications for more trustworthy open-domain QA systems.

Abstract

Retrieval-Augmented Generation (RAG) enhances factual accuracy by integrating external knowledge, yet it introduces a critical issue: erroneous or biased retrieval can mislead generation, compounding hallucinations, a phenomenon we term Hallucination on Hallucination. To address this, we propose Debate-Augmented RAG (DRAG), a training-free framework that integrates Multi-Agent Debate (MAD) mechanisms into both retrieval and generation stages. In retrieval, DRAG employs structured debates among proponents, opponents, and judges to refine retrieval quality and ensure factual reliability. In generation, DRAG introduces asymmetric information roles and adversarial debates, enhancing reasoning robustness and mitigating factual inconsistencies. Evaluations across multiple tasks demonstrate that DRAG improves retrieval reliability, reduces RAG-induced hallucinations, and significantly enhances overall factual accuracy. Our code is available at https://github.com/Huenao/Debate-Augmented-RAG.

Paper Structure

This paper contains 26 sections, 13 equations, 6 figures, 6 tables.

Figures (6)

  • Figure 1: Demonstration of Hallucination in Retrieval-Augmented Generation. In the first example, the error stems from retrieving information about a movie with the same name instead of the correct entity (a band). In the second, despite retrieving accurate information, retrieval noise still leads to an incorrect response.
  • Figure 2: An overview of our Debate-Augmented RAG (DRAG) framework. It iteratively refines the retrieval strategy and enhances factual consistency.
  • Figure 3: Case study of the response debate.
  • Figure 4: Average LLM and Retriever calls for DRAG and other baselines methods on the StrategyQA.
  • Figure 5: Case study of the Retrieval Debate.
  • ...and 1 more figures