EfficientRAG: Efficient Retriever for Multi-Hop Question Answering

Ziyuan Zhuang; Zhiyang Zhang; Sitao Cheng; Fangkai Yang; Jia Liu; Shujian Huang; Qingwei Lin; Saravan Rajmohan; Dongmei Zhang; Qi Zhang

EfficientRAG: Efficient Retriever for Multi-Hop Question Answering

Ziyuan Zhuang, Zhiyang Zhang, Sitao Cheng, Fangkai Yang, Jia Liu, Shujian Huang, Qingwei Lin, Saravan Rajmohan, Dongmei Zhang, Qi Zhang

TL;DR

EfficientRAG iteratively generates new queries without the need for LLM calls at each iteration and filters out irrelevant information, demonstrating that EfficientRAG surpasses existing RAG methods on three open-domain multi-hop question-answering datasets.

Abstract

Retrieval-augmented generation (RAG) methods encounter difficulties when addressing complex questions like multi-hop queries. While iterative retrieval methods improve performance by gathering additional information, current approaches often rely on multiple calls of large language models (LLMs). In this paper, we introduce EfficientRAG, an efficient retriever for multi-hop question answering. EfficientRAG iteratively generates new queries without the need for LLM calls at each iteration and filters out irrelevant information. Experimental results demonstrate that EfficientRAG surpasses existing RAG methods on three open-domain multi-hop question-answering datasets.

EfficientRAG: Efficient Retriever for Multi-Hop Question Answering

TL;DR

Abstract

Paper Structure (21 sections, 4 figures, 22 tables)

This paper contains 21 sections, 4 figures, 22 tables.

Introduction
Empirical Study
Capability of LLM generator
Retrieve with Query Decomposition
Methodology
EfficientRAG Framework
Synthetic Data Construction
Training
Experiments
End2end QA performance
Results and Analysis
Retrieval Performance
Inference Efficiency
Performance with Various Generators
Out-Of-Domain Adaptation
...and 6 more sections

Figures (4)

Figure 1: The performance with varying chunks settings over 2WikiMQA dataset with GPT-3.5/GPT-4/Llama3-8B as the generator.
Figure 2: Recall of retrieve efficiency over three retrieval strategies on MuSiQue dataset. The x-axis is log-scaled. Each point on different lines represents the same number of retrieved chunks.
Figure 3: EfficientRAG framework operates within the iterative RAG system. Initially, EfficientRAG retrieves relevant chunks from the knowledge base, tagging each as either <Terminate> or <Continue>, and annotates preserved tokens "KGOT in the Dimond Center" from the <Continue> chunks. The Filter then processes the concatenation of the original question and the previously annotated tokens, "Q: How large is the shopping mall where KGOT radio station has its studios? Info: KGOT, in the Dimond Center", and annotates the next-hop query tokens "How large is Dimond Center?". This iterative process continues until all chunks are tagged <Terminate> or the maximum number of iterations is reached.
Figure 4: The performance with varying chunks settings over HotpotQA, 2Wiki-Multihop and MuSiQue dataset with GPT-3.5/GPT-4/Llama-3 8B as the generator.

EfficientRAG: Efficient Retriever for Multi-Hop Question Answering

TL;DR

Abstract

EfficientRAG: Efficient Retriever for Multi-Hop Question Answering

Authors

TL;DR

Abstract

Table of Contents

Figures (4)