Table of Contents
Fetching ...

Memory-Aware and Uncertainty-Guided Retrieval for Multi-Hop Question Answering

Yuelyu Ji, Rui Meng, Zhuochun Li, Daqing He

TL;DR

The paper addresses multi-hop QA by integrating memory-aware retrieval with uncertainty-guided decision-making. It introduces prompt-based extraction, Retrieval-Integrated Neural Decision-making (RIND) driven by token-level entropy and attention signals, and memory-aware filtering with multiple strategies (No Filtering, CoT, Confidence, Hybrid). Across four datasets, MIND reduces unnecessary retrievals by about 10–15% and improves final answer accuracy (EM/F1), with ablations showing the hybrid CoT+Conf filter offering the best balance. Dynamic thresholding outperforms fixed thresholds, demonstrating adaptive retrieval suitable for varying question complexity. The work advances efficient, consistent multi-hop reasoning and lays groundwork for extensions to conversational and cross-domain QA tasks.

Abstract

Multi-hop question answering (QA) requires models to retrieve and reason over multiple pieces of evidence. While Retrieval-Augmented Generation (RAG) has made progress in this area, existing methods often suffer from two key limitations: (1) fixed or overly frequent retrieval steps, and (2) ineffective use of previously retrieved knowledge. We propose MIND (Memory-Informed and INteractive Dynamic RAG), a framework that addresses these challenges through: (i) prompt-based entity extraction to identify reasoning-relevant elements, (ii) dynamic retrieval triggering based on token-level entropy and attention signals, and (iii) memory-aware filtering, which stores high-confidence facts across reasoning steps to enable consistent multi-hop generation.

Memory-Aware and Uncertainty-Guided Retrieval for Multi-Hop Question Answering

TL;DR

The paper addresses multi-hop QA by integrating memory-aware retrieval with uncertainty-guided decision-making. It introduces prompt-based extraction, Retrieval-Integrated Neural Decision-making (RIND) driven by token-level entropy and attention signals, and memory-aware filtering with multiple strategies (No Filtering, CoT, Confidence, Hybrid). Across four datasets, MIND reduces unnecessary retrievals by about 10–15% and improves final answer accuracy (EM/F1), with ablations showing the hybrid CoT+Conf filter offering the best balance. Dynamic thresholding outperforms fixed thresholds, demonstrating adaptive retrieval suitable for varying question complexity. The work advances efficient, consistent multi-hop reasoning and lays groundwork for extensions to conversational and cross-domain QA tasks.

Abstract

Multi-hop question answering (QA) requires models to retrieve and reason over multiple pieces of evidence. While Retrieval-Augmented Generation (RAG) has made progress in this area, existing methods often suffer from two key limitations: (1) fixed or overly frequent retrieval steps, and (2) ineffective use of previously retrieved knowledge. We propose MIND (Memory-Informed and INteractive Dynamic RAG), a framework that addresses these challenges through: (i) prompt-based entity extraction to identify reasoning-relevant elements, (ii) dynamic retrieval triggering based on token-level entropy and attention signals, and (iii) memory-aware filtering, which stores high-confidence facts across reasoning steps to enable consistent multi-hop generation.

Paper Structure

This paper contains 23 sections, 4 equations, 1 figure, 4 tables.

Figures (1)

  • Figure 1: Overview of MIND. Given a multi-hop query (e.g., "Who is Charles Bretagne Marie De La Trémoille’s paternal grandfather?"), Step 1 (§\ref{['sec:prompt_extraction']}) uses an LLM prompt to extract candidate entities/facts. Step 2 (§\ref{['sec:rind_trigger']}) monitors partial generation with RIND and triggers retrieval when uncertainty rises. Step 3 (§\ref{['sec:memory_filter']}) stores high-confidence items in a memory module while discarding low-confidence ones (using either No Filter, CoT, Conf, or CoT+Conf). Step 4 (§\ref{['sec:iterative']}) repeats sub-query refinement (e.g., "Who is Jean Bretagne Charles’s father?") until no further retrieval is needed, yielding the final answer.