Table of Contents
Fetching ...

Retrieve, Summarize, Plan: Advancing Multi-hop Question Answering with an Iterative Approach

Zhouyu Jiang, Mengshu Sun, Lei Liang, Zhiqiang Zhang

TL;DR

The paper addresses the challenges of context overload and lack of retrieval trajectory in multi-hop QA with iterative RAG. It introduces ReSP, a framework that uses a dual-function summarizer to produce both global evidence memory and local pathway memory, enabling controlled iteration and more accurate answer generation. Empirical results on HotpotQA and 2WikiMultihopQA show substantial gains over state-of-the-art methods, along with improved robustness to context length and adaptability across base models. This approach offers improved transparency of the reasoning process via exposed memory and retrieval history, increasing trustworthiness for deployment in practical settings.

Abstract

Multi-hop question answering is a challenging task with distinct industrial relevance, and Retrieval-Augmented Generation (RAG) methods based on large language models (LLMs) have become a popular approach to tackle this task. Owing to the potential inability to retrieve all necessary information in a single iteration, a series of iterative RAG methods has been recently developed, showing significant performance improvements. However, existing methods still face two critical challenges: context overload resulting from multiple rounds of retrieval, and over-planning and repetitive planning due to the lack of a recorded retrieval trajectory. In this paper, we propose a novel iterative RAG method called ReSP, equipped with a dual-function summarizer. This summarizer compresses information from retrieved documents, targeting both the overarching question and the current sub-question concurrently. Experimental results on the multi-hop question-answering datasets HotpotQA and 2WikiMultihopQA demonstrate that our method significantly outperforms the state-of-the-art, and exhibits excellent robustness concerning context length.

Retrieve, Summarize, Plan: Advancing Multi-hop Question Answering with an Iterative Approach

TL;DR

The paper addresses the challenges of context overload and lack of retrieval trajectory in multi-hop QA with iterative RAG. It introduces ReSP, a framework that uses a dual-function summarizer to produce both global evidence memory and local pathway memory, enabling controlled iteration and more accurate answer generation. Empirical results on HotpotQA and 2WikiMultihopQA show substantial gains over state-of-the-art methods, along with improved robustness to context length and adaptability across base models. This approach offers improved transparency of the reasoning process via exposed memory and retrieval history, increasing trustworthiness for deployment in practical settings.

Abstract

Multi-hop question answering is a challenging task with distinct industrial relevance, and Retrieval-Augmented Generation (RAG) methods based on large language models (LLMs) have become a popular approach to tackle this task. Owing to the potential inability to retrieve all necessary information in a single iteration, a series of iterative RAG methods has been recently developed, showing significant performance improvements. However, existing methods still face two critical challenges: context overload resulting from multiple rounds of retrieval, and over-planning and repetitive planning due to the lack of a recorded retrieval trajectory. In this paper, we propose a novel iterative RAG method called ReSP, equipped with a dual-function summarizer. This summarizer compresses information from retrieved documents, targeting both the overarching question and the current sub-question concurrently. Experimental results on the multi-hop question-answering datasets HotpotQA and 2WikiMultihopQA demonstrate that our method significantly outperforms the state-of-the-art, and exhibits excellent robustness concerning context length.
Paper Structure (20 sections, 3 equations, 5 figures, 5 tables, 1 algorithm)

This paper contains 20 sections, 3 equations, 5 figures, 5 tables, 1 algorithm.

Figures (5)

  • Figure 1: RAG pipelines illustration and challenges faced.
  • Figure 2: The ReSP framework consists of four modules: Reasoner, Retriever, Summarizer, and Generator. The reasoner makes decisions based on the current memory queues, determining whether to exit the iteration and generate a response or to produce a sub-question for further iteration. The retriever searches the corpus based on the sub-question provided by the reasoner (for the first iteration, the sub-question is the same as the overarching question, thus the reasonser is bypassed). The summarizer performs dual summarization on the retrieved documents, extracting information relevant to both the overarching question Q and the current sub-question Q*, and stores the summaries in the global evidence memory and local pathway memory queues respectively. The generator produces answer A based on the information in the memory queues.
  • Figure 3: Performance comparison across different base models. We report the token-level F1 score obtained from testing on the HotpotQA dataset.
  • Figure 4: Bar chart of the performance variations of different RAG methods with varying numbers of retrieved documents per iteration. We report the token-level F1 score obtained from testing on the HotpotQA dataset.
  • Figure 5: Performance with different maximum number of iterations.