Table of Contents
Fetching ...

DR-RAG: Applying Dynamic Document Relevance to Retrieval-Augmented Generation for Question-Answering

Zijian Hei, Weiling Liu, Wenjie Ou, Juyi Qiao, Junming Jiao, Guowen Song, Ting Tian, Yi Lin

TL;DR

The experimental results on multi-hop QA datasets show that DR-RAG can significantly improve the accuracy of the answers and achieve new progress in QA systems.

Abstract

Retrieval-Augmented Generation (RAG) has recently demonstrated the performance of Large Language Models (LLMs) in the knowledge-intensive tasks such as Question-Answering (QA). RAG expands the query context by incorporating external knowledge bases to enhance the response accuracy. However, it would be inefficient to access LLMs multiple times for each query and unreliable to retrieve all the relevant documents by a single query. We have found that even though there is low relevance between some critical documents and query, it is possible to retrieve the remaining documents by combining parts of the documents with the query. To mine the relevance, a two-stage retrieval framework called Dynamic-Relevant Retrieval-Augmented Generation (DR-RAG) is proposed to improve document retrieval recall and the accuracy of answers while maintaining efficiency. Additionally, a compact classifier is applied to two different selection strategies to determine the contribution of the retrieved documents to answering the query and retrieve the relatively relevant documents. Meanwhile, DR-RAG call the LLMs only once, which significantly improves the efficiency of the experiment. The experimental results on multi-hop QA datasets show that DR-RAG can significantly improve the accuracy of the answers and achieve new progress in QA systems.

DR-RAG: Applying Dynamic Document Relevance to Retrieval-Augmented Generation for Question-Answering

TL;DR

The experimental results on multi-hop QA datasets show that DR-RAG can significantly improve the accuracy of the answers and achieve new progress in QA systems.

Abstract

Retrieval-Augmented Generation (RAG) has recently demonstrated the performance of Large Language Models (LLMs) in the knowledge-intensive tasks such as Question-Answering (QA). RAG expands the query context by incorporating external knowledge bases to enhance the response accuracy. However, it would be inefficient to access LLMs multiple times for each query and unreliable to retrieve all the relevant documents by a single query. We have found that even though there is low relevance between some critical documents and query, it is possible to retrieve the remaining documents by combining parts of the documents with the query. To mine the relevance, a two-stage retrieval framework called Dynamic-Relevant Retrieval-Augmented Generation (DR-RAG) is proposed to improve document retrieval recall and the accuracy of answers while maintaining efficiency. Additionally, a compact classifier is applied to two different selection strategies to determine the contribution of the retrieved documents to answering the query and retrieve the relatively relevant documents. Meanwhile, DR-RAG call the LLMs only once, which significantly improves the efficiency of the experiment. The experimental results on multi-hop QA datasets show that DR-RAG can significantly improve the accuracy of the answers and achieve new progress in QA systems.
Paper Structure (22 sections, 4 equations, 3 figures, 13 tables)

This paper contains 22 sections, 4 equations, 3 figures, 13 tables.

Figures (3)

  • Figure 1: An example shows that retriever easily introduces static-relevant documents due to high relevance (red), but struggles to retrieve dynamic-relevant documents which are of low relevance (blue) but critical for the answer. Stars are levels of retrieval difficulty.
  • Figure 2: An overview of DR-RAG. In step 1, we retrieve static-relevant documents (SR-Documents) due to high relevance with the query. Then we concatenate SR-Documents with the query to retrieve multiple dynamic-relevant documents (DR-Documents) in step 2. Finally, we select each of DR-Documents in turn to concatenate with the query and SR-Documents and feed them into the classifier to select the most relevant DR-Document.
  • Figure 3: QA performance (F1) and time for different RAG frameworks. We use the GPT-3.5-turbo as the base LLM on the multi-hop QA datasets (MuSiQue, HotpotQA and 2Wiki).