Table of Contents
Fetching ...

Knowing What's Missing: Assessing Information Sufficiency in Question Answering

Akriti Jain, Aparna Garimella

TL;DR

The paper tackles the challenge of determining information sufficiency in QA by reframing sufficiency as missing-information identification. It introduces an Identify-then-Verify framework that generates multiple hypotheses about gaps, forms a semantic consensus, and verifies the consensus against the source text to make a final sufficiency decision. Across diverse multi-hop and answerability datasets, this approach improves accuracy on complex inferential questions and provides actionable, gap-focused outputs for retrieval. The method also demonstrates adaptability along a spectrum from pragmatic inference to literal completeness by tuning the verification stage, offering a more reliable and interpretable alternative to direct sufficiency prompting.

Abstract

Determining whether a provided context contains sufficient information to answer a question is a critical challenge for building reliable question-answering systems. While simple prompting strategies have shown success on factual questions, they frequently fail on inferential ones that require reasoning beyond direct text extraction. We hypothesize that asking a model to first reason about what specific information is missing provides a more reliable, implicit signal for assessing overall sufficiency. To this end, we propose a structured Identify-then-Verify framework for robust sufficiency modeling. Our method first generates multiple hypotheses about missing information and establishes a semantic consensus. It then performs a critical verification step, forcing the model to re-examine the source text to confirm whether this information is truly absent. We evaluate our method against established baselines across diverse multi-hop and factual QA datasets. The results demonstrate that by guiding the model to justify its claims about missing information, our framework produces more accurate sufficiency judgments while clearly articulating any information gaps.

Knowing What's Missing: Assessing Information Sufficiency in Question Answering

TL;DR

The paper tackles the challenge of determining information sufficiency in QA by reframing sufficiency as missing-information identification. It introduces an Identify-then-Verify framework that generates multiple hypotheses about gaps, forms a semantic consensus, and verifies the consensus against the source text to make a final sufficiency decision. Across diverse multi-hop and answerability datasets, this approach improves accuracy on complex inferential questions and provides actionable, gap-focused outputs for retrieval. The method also demonstrates adaptability along a spectrum from pragmatic inference to literal completeness by tuning the verification stage, offering a more reliable and interpretable alternative to direct sufficiency prompting.

Abstract

Determining whether a provided context contains sufficient information to answer a question is a critical challenge for building reliable question-answering systems. While simple prompting strategies have shown success on factual questions, they frequently fail on inferential ones that require reasoning beyond direct text extraction. We hypothesize that asking a model to first reason about what specific information is missing provides a more reliable, implicit signal for assessing overall sufficiency. To this end, we propose a structured Identify-then-Verify framework for robust sufficiency modeling. Our method first generates multiple hypotheses about missing information and establishes a semantic consensus. It then performs a critical verification step, forcing the model to re-examine the source text to confirm whether this information is truly absent. We evaluate our method against established baselines across diverse multi-hop and factual QA datasets. The results demonstrate that by guiding the model to justify its claims about missing information, our framework produces more accurate sufficiency judgments while clearly articulating any information gaps.

Paper Structure

This paper contains 18 sections, 4 figures, 6 tables.

Figures (4)

  • Figure 1: LLMs Excel at Factual Verification but Falter on Inferential Reasoning. (Left) For a fact-based multi-hop question, the model correctly identifies a missing factual link (the location of TCS headquarters) and accurately classifies the context as Insufficient. (Right) Conversely, for a question requiring simple inference, the model fails. It incorrectly predicts Insufficient when asked for the purpose of a business grant, unable to deduce that the scheme's stated function, a scheme which enables local authorities to keep part of business rates-directly serves as its purpose. This highlights a key limitation where models struggle to assess sufficiency beyond literal fact-checking.
  • Figure 2: An Overview of Our Identify-then-Verify Framework. Given a question and context, the framework first queries an LLM multiple times with non-zero temperature to generate a diverse set of hypotheses about missing information (Step 1: Identify). This captures a rich distribution of the model's reasoning. Next, a single Consensus Gap Claim is established from these hypotheses (Step 2a). This claim is then checked against the original context in a final verification step (Step 2b). This verification acts as a critical self-correction mechanism, with the final sufficiency decision based on its outcome.
  • Figure 3: Rate of Hypothesis Disagreement Across Multiple Runs. The plot shows the percentage of samples with conflicting judgments for a given number of runs. The sharpest rise occurs when moving from one to two runs, highlighting the risk of a single pass. The curve flattens significantly after four runs, motivating our choice of $N=5$ as the point of stability.
  • Figure 4: Relative Error Reduction (RER) from the verification step across various datasets. The bars show the percentage of errors from the initial consensus that were corrected.