Table of Contents
Fetching ...

DMQR-RAG: Diverse Multi-Query Rewriting for RAG

Zhicong Li, Jiahao Wang, Zhishu Jiang, Hangyu Mao, Zhongxia Chen, Jiazhen Du, Yuanxing Zhang, Fuzheng Zhang, Di Zhang, Yong Liu

Abstract

Large language models often encounter challenges with static knowledge and hallucinations, which undermine their reliability. Retrieval-augmented generation (RAG) mitigates these issues by incorporating external information. However, user queries frequently contain noise and intent deviations, necessitating query rewriting to improve the relevance of retrieved documents. In this paper, we introduce DMQR-RAG, a Diverse Multi-Query Rewriting framework designed to improve the performance of both document retrieval and final responses in RAG. Specifically, we investigate how queries with varying information quantities can retrieve a diverse array of documents, presenting four rewriting strategies that operate at different levels of information to enhance the performance of baseline approaches. Additionally, we propose an adaptive strategy selection method that minimizes the number of rewrites while optimizing overall performance. Our methods have been rigorously validated through extensive experiments conducted in both academic and industry settings.

DMQR-RAG: Diverse Multi-Query Rewriting for RAG

Abstract

Large language models often encounter challenges with static knowledge and hallucinations, which undermine their reliability. Retrieval-augmented generation (RAG) mitigates these issues by incorporating external information. However, user queries frequently contain noise and intent deviations, necessitating query rewriting to improve the relevance of retrieved documents. In this paper, we introduce DMQR-RAG, a Diverse Multi-Query Rewriting framework designed to improve the performance of both document retrieval and final responses in RAG. Specifically, we investigate how queries with varying information quantities can retrieve a diverse array of documents, presenting four rewriting strategies that operate at different levels of information to enhance the performance of baseline approaches. Additionally, we propose an adaptive strategy selection method that minimizes the number of rewrites while optimizing overall performance. Our methods have been rigorously validated through extensive experiments conducted in both academic and industry settings.

Paper Structure

This paper contains 28 sections, 4 equations, 4 figures, 3 tables.

Figures (4)

  • Figure 1: The motivation of our work. (a) Users often struggle to express their intentions accurately, which can lead to the retrieval of irrelevant documents. (b) In some cases, rewritten queries can successfully retrieve relevant documents. (c) Rewritten queries that are similar (i.e., lacking diversity) may yield similar document retrievals, potentially overlooking other relevant documents. (d) Our DMQR-RAG encourages diverse rewritten queries, resulting in a broader range of retrieved documents that encompass all relevant items. (e) Our adaptive rewriting selection eliminates unnecessary rewrites without compromising relevant document retrieval, while also reducing noise by minimizing the retrieval of irrelevant documents.
  • Figure 2: The results of adaptive rewriting selection: distribution of rewriting number.
  • Figure 3: The results of adaptive rewriting selection: retrieval and answer performance.
  • Figure 4: The results from real-world industry scenarios.