Table of Contents
Fetching ...

InfoGatherer: Principled Information Seeking via Evidence Retrieval and Strategic Questioning

Maksym Taranukhin, Shuyue Stella Li, Evangelos Milios, Geoff Pleiss, Yulia Tsvetkov, Vered Shwartz

TL;DR

InformationGatherer is proposed, a framework that gathers missing information from two complementary sources: retrieved domain documents and targeted follow-up questions to the user, enabling principled fusion of incomplete and potentially contradictory evidence from both sources without prematurely collapsing to a definitive answer.

Abstract

LLMs are increasingly deployed in high-stakes domains such as medical triage and legal assistance, often as document-grounded QA systems in which a user provides a description, relevant sources are retrieved, and an LLM generates a prediction. In practice, initial user queries are often underspecified, and a single retrieval pass is insufficient for reliable decision-making, leading to incorrect and overly confident answers. While follow-up questioning can elicit missing information, existing methods typically depend on implicit, unstructured confidence signals from the LLM, making it difficult to determine what remains unknown, what information matters most, and when to stop asking questions. We propose InfoGatherer, a framework that gathers missing information from two complementary sources: retrieved domain documents and targeted follow-up questions to the user. InfoGatherer models uncertainty using Dempster-Shafer belief assignments over a structured evidential network, enabling principled fusion of incomplete and potentially contradictory evidence from both sources without prematurely collapsing to a definitive answer. Across legal and medical tasks, InfoGatherer outperforms strong baselines while requiring fewer turns. By grounding uncertainty in formal evidential theory rather than heuristic LLM signals, InfoGatherer moves towards trustworthy, interpretable decision support in domains where reliability is critical.

InfoGatherer: Principled Information Seeking via Evidence Retrieval and Strategic Questioning

TL;DR

InformationGatherer is proposed, a framework that gathers missing information from two complementary sources: retrieved domain documents and targeted follow-up questions to the user, enabling principled fusion of incomplete and potentially contradictory evidence from both sources without prematurely collapsing to a definitive answer.

Abstract

LLMs are increasingly deployed in high-stakes domains such as medical triage and legal assistance, often as document-grounded QA systems in which a user provides a description, relevant sources are retrieved, and an LLM generates a prediction. In practice, initial user queries are often underspecified, and a single retrieval pass is insufficient for reliable decision-making, leading to incorrect and overly confident answers. While follow-up questioning can elicit missing information, existing methods typically depend on implicit, unstructured confidence signals from the LLM, making it difficult to determine what remains unknown, what information matters most, and when to stop asking questions. We propose InfoGatherer, a framework that gathers missing information from two complementary sources: retrieved domain documents and targeted follow-up questions to the user. InfoGatherer models uncertainty using Dempster-Shafer belief assignments over a structured evidential network, enabling principled fusion of incomplete and potentially contradictory evidence from both sources without prematurely collapsing to a definitive answer. Across legal and medical tasks, InfoGatherer outperforms strong baselines while requiring fewer turns. By grounding uncertainty in formal evidential theory rather than heuristic LLM signals, InfoGatherer moves towards trustworthy, interpretable decision support in domains where reliability is critical.
Paper Structure (40 sections, 9 equations, 5 figures, 2 tables, 1 algorithm)

This paper contains 40 sections, 9 equations, 5 figures, 2 tables, 1 algorithm.

Figures (5)

  • Figure 1: The overview of the InfoGatherer method. The agent uses user input and relevant documents to construct an evidential network representing its current belief in the true hypothesis. By identifying areas of uncertainty within the network, the agent generates targeted follow-up questions to refine its belief systematically.
  • Figure 1: Constructing the evidential network.
  • Figure 2: An overview of the InfoGatherer pipeline using a medical diagnosis example. Left: The agent uses the initial query to retrieve documents and construct an evidential network, extracting Basic Belief Assignments (BBAs) from text. Right: The interaction loop where the agent selects an informative node (e.g., 'fever') to generate a targeted question, and updates its belief distribution based on the user's answer. This cycle repeats until the pignistic probability (BetP) of a leading hypothesis meets the stopping condition.
  • Figure 3: Impact of contextual documents on dialogue performance. With gpt-5-nano, InfoGatherer's (IG) retrieval-grounded evidence improves dialogue efficiency and is especially beneficial in the legal domain, while model-generated references can improve success in the medical domain.
  • Figure 4: Average confidence in the correct hypothesis across dialogue turns for five methods, using each method’s native confidence metric. The InfoGatherer's objective, which explicitly targets uncertainty reduction, yields smoother, more information-aligned increases.