Table of Contents
Fetching ...

Momentum Posterior Regularization for Multi-hop Dense Retrieval

Zehua Xia, Yuyang Wu, Yiyun Xia, Cam-Tu Nguyen

TL;DR

The paper tackles the problem of improving multi-hop dense retrieval by distilling posterior information into the prior retriever. It introduces MoPo, a momentum-based posterior regularization framework, and defines posterior information for each hop as a query-focused summary, enabling smoother distillation through a one-stage training regime. To support this, it constructs PostSumQA via backward summary generation, providing 22,696 annotated examples for learning posterior summaries. Empirical results on HotpotQA and StrategyQA show MoPo outperforms existing baselines in retrieval and downstream tasks (reranking and QA), while offering better computational efficiency than LLM-heavy reasoning approaches. Overall, MoPo advances practical multi-hop QA by combining posterior-guided retrieval with momentum-based training to bridge the prior-posterior gap more robustly.

Abstract

Multi-hop question answering (QA) often requires sequential retrieval (multi-hop retrieval), where each hop retrieves missing knowledge based on information from previous hops. To facilitate more effective retrieval, we aim to distill knowledge from a posterior retrieval, which has access to posterior information like an answer, into a prior retrieval used during inference when such information is unavailable. Unfortunately, current methods for knowledge distillation in one-time retrieval are ineffective for multi-hop QA due to two issues: 1) Posterior information is often defined as the response (i.e. the answer), which may not clearly connect to the query without intermediate retrieval; and 2) The large knowledge gap between prior and posterior retrievals makes existing distillation methods unstable, even resulting in performance loss. As such, we propose MoPo (Momentum Posterior Regularization) with two key innovations: 1) Posterior information of one hop is defined as a query-focus summary from the golden knowledge of the previous and current hops; 2) We develop an effective training strategy where the posterior retrieval is updated along with the prior retrieval via momentum moving average method, allowing smoother and effective distillation. Experiments on HotpotQA and StrategyQA demonstrate that MoPo outperforms existing baselines in both retrieval and downstream QA tasks.

Momentum Posterior Regularization for Multi-hop Dense Retrieval

TL;DR

The paper tackles the problem of improving multi-hop dense retrieval by distilling posterior information into the prior retriever. It introduces MoPo, a momentum-based posterior regularization framework, and defines posterior information for each hop as a query-focused summary, enabling smoother distillation through a one-stage training regime. To support this, it constructs PostSumQA via backward summary generation, providing 22,696 annotated examples for learning posterior summaries. Empirical results on HotpotQA and StrategyQA show MoPo outperforms existing baselines in retrieval and downstream tasks (reranking and QA), while offering better computational efficiency than LLM-heavy reasoning approaches. Overall, MoPo advances practical multi-hop QA by combining posterior-guided retrieval with momentum-based training to bridge the prior-posterior gap more robustly.

Abstract

Multi-hop question answering (QA) often requires sequential retrieval (multi-hop retrieval), where each hop retrieves missing knowledge based on information from previous hops. To facilitate more effective retrieval, we aim to distill knowledge from a posterior retrieval, which has access to posterior information like an answer, into a prior retrieval used during inference when such information is unavailable. Unfortunately, current methods for knowledge distillation in one-time retrieval are ineffective for multi-hop QA due to two issues: 1) Posterior information is often defined as the response (i.e. the answer), which may not clearly connect to the query without intermediate retrieval; and 2) The large knowledge gap between prior and posterior retrievals makes existing distillation methods unstable, even resulting in performance loss. As such, we propose MoPo (Momentum Posterior Regularization) with two key innovations: 1) Posterior information of one hop is defined as a query-focus summary from the golden knowledge of the previous and current hops; 2) We develop an effective training strategy where the posterior retrieval is updated along with the prior retrieval via momentum moving average method, allowing smoother and effective distillation. Experiments on HotpotQA and StrategyQA demonstrate that MoPo outperforms existing baselines in both retrieval and downstream QA tasks.

Paper Structure

This paper contains 58 sections, 12 equations, 8 figures, 14 tables.

Figures (8)

  • Figure 1: An example from HotpotQA benchmark. Given the 2-hop question, an iterative retriever is expected to retrieve the 1st and 2nd golden paragraph sequentially. After every retrieval step, a query-focused summary combining the query and the retrieved paragraph is generated, which is a kind of posterior information before conducting its retrieval. We utilize it to enhance the retriever training as shown by the red arrow.
  • Figure 2: Total absolute and relative loss curve of PR and MoPo, $\lambda\hbox{$=$}0.3$, where relative Loss Ratio = InfoNCE Loss / Total Loss. All curves are processed with the same smoothing factor.
  • Figure A1: Main content of the prompt words generated by the first hop summary of the bridge type sample
  • Figure A2: Example of reasoning steps for first-hop summary generation in bridge-type question answering
  • Figure A3: Main content of inference hint words for summary generation of comparative question-answer pairs
  • ...and 3 more figures