Table of Contents
Fetching ...

Hierarchical Retrieval-Augmented Generation Model with Rethink for Multi-hop Question Answering

Xiaoming Zhang, Ming Wang, Xiaocui Yang, Daling Wang, Shi Feng, Yifei Zhang

TL;DR

HiRAG tackles multi-hop QA by introducing a hierarchical retrieval-augmented generation framework that combines sparse document-level and dense chunk-level retrieval with a verify/rethink loop. The architecture comprises five modules—Decomposer, Definer, Retriever, Filter, and Summarizer—plus two knowledge corpora, Indexed Wikicorpus and Profile Wikicorpus, to keep information up-to-date and entity-centric. Experiments on HotPotQA, 2WikiMultiHopQA, MuSiQue, and Bamboogle show HiRAG achieving state-of-the-art results on most datasets, with particularly large gains on 2WikiMultiHopQA, and the Indexed Wikicorpus contributing to retrieval effectiveness. The work provides a practical, plug-in capable retrieval engine and releases the corpora and code to advance multi-hop QA research.

Abstract

Multi-hop Question Answering (QA) necessitates complex reasoning by integrating multiple pieces of information to resolve intricate questions. However, existing QA systems encounter challenges such as outdated information, context window length limitations, and an accuracy-quantity trade-off. To address these issues, we propose a novel framework, the Hierarchical Retrieval-Augmented Generation Model with Rethink (HiRAG), comprising Decomposer, Definer, Retriever, Filter, and Summarizer five key modules. We introduce a new hierarchical retrieval strategy that incorporates both sparse retrieval at the document level and dense retrieval at the chunk level, effectively integrating their strengths. Additionally, we propose a single-candidate retrieval method to mitigate the limitations of multi-candidate retrieval. We also construct two new corpora, Indexed Wikicorpus and Profile Wikicorpus, to address the issues of outdated and insufficient knowledge. Our experimental results on four datasets demonstrate that HiRAG outperforms state-of-the-art models across most metrics, and our Indexed Wikicorpus is effective. The code for HiRAG is available at https://github.com/2282588541a/HiRAG

Hierarchical Retrieval-Augmented Generation Model with Rethink for Multi-hop Question Answering

TL;DR

HiRAG tackles multi-hop QA by introducing a hierarchical retrieval-augmented generation framework that combines sparse document-level and dense chunk-level retrieval with a verify/rethink loop. The architecture comprises five modules—Decomposer, Definer, Retriever, Filter, and Summarizer—plus two knowledge corpora, Indexed Wikicorpus and Profile Wikicorpus, to keep information up-to-date and entity-centric. Experiments on HotPotQA, 2WikiMultiHopQA, MuSiQue, and Bamboogle show HiRAG achieving state-of-the-art results on most datasets, with particularly large gains on 2WikiMultiHopQA, and the Indexed Wikicorpus contributing to retrieval effectiveness. The work provides a practical, plug-in capable retrieval engine and releases the corpora and code to advance multi-hop QA research.

Abstract

Multi-hop Question Answering (QA) necessitates complex reasoning by integrating multiple pieces of information to resolve intricate questions. However, existing QA systems encounter challenges such as outdated information, context window length limitations, and an accuracy-quantity trade-off. To address these issues, we propose a novel framework, the Hierarchical Retrieval-Augmented Generation Model with Rethink (HiRAG), comprising Decomposer, Definer, Retriever, Filter, and Summarizer five key modules. We introduce a new hierarchical retrieval strategy that incorporates both sparse retrieval at the document level and dense retrieval at the chunk level, effectively integrating their strengths. Additionally, we propose a single-candidate retrieval method to mitigate the limitations of multi-candidate retrieval. We also construct two new corpora, Indexed Wikicorpus and Profile Wikicorpus, to address the issues of outdated and insufficient knowledge. Our experimental results on four datasets demonstrate that HiRAG outperforms state-of-the-art models across most metrics, and our Indexed Wikicorpus is effective. The code for HiRAG is available at https://github.com/2282588541a/HiRAG
Paper Structure (26 sections, 7 equations, 5 figures, 7 tables)

This paper contains 26 sections, 7 equations, 5 figures, 7 tables.

Figures (5)

  • Figure 1: Outdated knowledge and insufficient knowledge existing in the old corpus.
  • Figure 2: Multiple candidate chunks vs. single candidate chunk. The knowledge within the solid box represents the retrieved knowledge. In subfigure \ref{['Fig2.sub.b.single_chunk']}, the knowledge within the dotted box will be considered as newly retrieved knowledge only if the answer to the sub-question is incorrect.
  • Figure 3: Old corpus vs. Indexed Wikicorpus. In the new corpus, we specialize in constructing a document for each entity and then dividing it into chunks.
  • Figure 4: The framework of the Hierarchical Retrieval-Augmented Generation Moel with Rethink (HiRAG), including Decomposer module, Definer module, Retriever module, Filter module, and Summarizer module. The process begins with the Decomposer module, which breaks down the original question into several sub-questions. Each sub-question is then forwarded to the Retriever module to retrieve pertinent knowledge. For illustration, we consider the handling of the first sub-question as an example. The knowledge obtained is passed to the Verify module, where each sub-answer is evaluated for accuracy. If a sub-answer is verified as correct, it is stored in the sub-answer pool. Should it be incorrect, a rethinking process is initiated. Subsequent sub-questions undergo a similar sequence of decomposition, retrieval, and verification. Ultimately, all sub-answers, along with the original question, are compiled by the Summarizer module to output the final answer. Solid lines denote items that are currently being processed, while dotted lines indicate items that are pending processing or are about to be addressed. The llama icon represents LLaMa-3-70B and the ChatGPT icon is GPT-3.5-turbo in this paper.
  • Figure 5: Case study of HiRAG process.