Table of Contents
Fetching ...

Generate-then-Ground in Retrieval-Augmented Generation for Multi-hop Question Answering

Zhengliang Shi, Weiwei Sun, Shen Gao, Pengjie Ren, Zhumin Chen, Zhaochun Ren

TL;DR

The paper addresses the challenge of multi-hop question answering by identifying the limitations of the retrieve-then-read paradigm, notably retriever dependence and noise in retrieved documents. It introduces GenGround, a generate-then-ground framework in which an LLM alternates between answer deduction (formulating simple sub-questions and producing immediate answers) and instructional grounding (grounding those answers in retrieved evidence to revise errors), with a batch grounding strategy for efficiency. To extend applicability to smaller models, the authors propose Instructional Grounding Distillation (IDG), which uses a 50k single-hop synthetic dataset to distill grounding trajectories into a student model, enabling competitive performance with fewer parameters. Extensive experiments on four MHQA benchmarks show GenGround outperforming strong baselines across metrics, with IDG further empowering smaller models to approach or exceed baselines, indicating both improved accuracy and practical efficiency for real-world retrieval-augmented QA systems.

Abstract

Multi-Hop Question Answering (MHQA) tasks present a significant challenge for large language models (LLMs) due to the intensive knowledge required. Current solutions, like Retrieval-Augmented Generation, typically retrieve potential documents from an external corpus to read an answer. However, the performance of this retrieve-then-read paradigm is constrained by the retriever and the inevitable noise in the retrieved documents. To mitigate these challenges, we introduce a novel generate-then-ground (GenGround) framework, synergizing the parametric knowledge of LLMs and external documents to solve a multi-hop question. GenGround empowers LLMs to alternate two phases until the final answer is derived: (1) formulate a simpler, single-hop question and directly generate the answer; (2) ground the question-answer pair in retrieved documents, amending any wrong predictions in the answer. We also propose an instructional grounding distillation method to generalize our method into smaller models. Extensive experiments conducted on four datasets illustrate the superiority of our method.

Generate-then-Ground in Retrieval-Augmented Generation for Multi-hop Question Answering

TL;DR

The paper addresses the challenge of multi-hop question answering by identifying the limitations of the retrieve-then-read paradigm, notably retriever dependence and noise in retrieved documents. It introduces GenGround, a generate-then-ground framework in which an LLM alternates between answer deduction (formulating simple sub-questions and producing immediate answers) and instructional grounding (grounding those answers in retrieved evidence to revise errors), with a batch grounding strategy for efficiency. To extend applicability to smaller models, the authors propose Instructional Grounding Distillation (IDG), which uses a 50k single-hop synthetic dataset to distill grounding trajectories into a student model, enabling competitive performance with fewer parameters. Extensive experiments on four MHQA benchmarks show GenGround outperforming strong baselines across metrics, with IDG further empowering smaller models to approach or exceed baselines, indicating both improved accuracy and practical efficiency for real-world retrieval-augmented QA systems.

Abstract

Multi-Hop Question Answering (MHQA) tasks present a significant challenge for large language models (LLMs) due to the intensive knowledge required. Current solutions, like Retrieval-Augmented Generation, typically retrieve potential documents from an external corpus to read an answer. However, the performance of this retrieve-then-read paradigm is constrained by the retriever and the inevitable noise in the retrieved documents. To mitigate these challenges, we introduce a novel generate-then-ground (GenGround) framework, synergizing the parametric knowledge of LLMs and external documents to solve a multi-hop question. GenGround empowers LLMs to alternate two phases until the final answer is derived: (1) formulate a simpler, single-hop question and directly generate the answer; (2) ground the question-answer pair in retrieved documents, amending any wrong predictions in the answer. We also propose an instructional grounding distillation method to generalize our method into smaller models. Extensive experiments conducted on four datasets illustrate the superiority of our method.
Paper Structure (29 sections, 4 equations, 6 figures, 7 tables)

This paper contains 29 sections, 4 equations, 6 figures, 7 tables.

Figures (6)

  • Figure 1: The top block depicts the comparison with the commonly-used retrieve-then-read paradigm in MHQA task. The bottom block provides the performance of our method and baselines in four MHQA benchmarks.
  • Figure 2: The architecture of the proposed generate-then-ground framework.
  • Figure 3: The instruction for the answer deduction (a) and instructional knowledge grounding(b) phases in our framework. The pink and yellow blacks indicate the input while the gray blocks indicate the output.
  • Figure 4: Demonstration of our batch grounding strategy with the batch size of 3 and retrieved documents amount of 10, where the LLMs ground the input question-answer pair into the second batch.
  • Figure 5: The fine-granularity correctness analysis of our answer deduction and knowledge grounding phases.
  • ...and 1 more figures