Table of Contents
Fetching ...

PassiveQA: A Three-Action Framework for Epistemically Calibrated Question Answering via Supervised Finetuning

Madhav S Baidya

Abstract

Large Language Models (LLMs) have achieved strong performance in question answering and retrieval-augmented generation (RAG), yet they implicitly assume that user queries are fully specified and answerable. In real-world settings, queries are often incomplete, ambiguous, or missing critical variables, leading models to produce overconfident or hallucinated responses. In this work, we study decision-aware query resolution under incomplete information, where a model must determine whether to Answer, Ask for clarification, or Abstain. We show that standard and enhanced RAG systems do not reliably exhibit such epistemic awareness, defaulting to answer generation even when information is insufficient. To address this, we propose PassiveQA, a three-action framework that aligns model behaviour with information sufficiency through supervised finetuning. Our approach integrates structured information-state representations, knowledge graph-grounded context, and a finetuned planner that explicitly models missing variables and decision reasoning. Experiments across multiple QA datasets show that the finetuned planner achieves significant improvements in macro F1 and abstention recall while reducing hallucination rates, under a compute-constrained training regime. These results provide strong empirical evidence that epistemic decision-making must be learned during training rather than imposed at inference time.

PassiveQA: A Three-Action Framework for Epistemically Calibrated Question Answering via Supervised Finetuning

Abstract

Large Language Models (LLMs) have achieved strong performance in question answering and retrieval-augmented generation (RAG), yet they implicitly assume that user queries are fully specified and answerable. In real-world settings, queries are often incomplete, ambiguous, or missing critical variables, leading models to produce overconfident or hallucinated responses. In this work, we study decision-aware query resolution under incomplete information, where a model must determine whether to Answer, Ask for clarification, or Abstain. We show that standard and enhanced RAG systems do not reliably exhibit such epistemic awareness, defaulting to answer generation even when information is insufficient. To address this, we propose PassiveQA, a three-action framework that aligns model behaviour with information sufficiency through supervised finetuning. Our approach integrates structured information-state representations, knowledge graph-grounded context, and a finetuned planner that explicitly models missing variables and decision reasoning. Experiments across multiple QA datasets show that the finetuned planner achieves significant improvements in macro F1 and abstention recall while reducing hallucination rates, under a compute-constrained training regime. These results provide strong empirical evidence that epistemic decision-making must be learned during training rather than imposed at inference time.

Paper Structure

This paper contains 98 sections, 23 equations, 4 figures, 7 tables.

Figures (4)

  • Figure 1: Full PassiveQA pipeline. Left to right: the four source datasets are merged into a unified 61K-sample schema with explicit variable-state fields (§\ref{['sec:data']}); a knowledge base of 105,420 chunks is constructed and indexed (§\ref{['sec:rag']}); three progressive RAG architectures are evaluated on the KB alone (§\ref{['sec:rag']}); the KB is simultaneously processed through a three-phase KG construction pipeline producing the decision-weighted graph $G_2$ (§\ref{['sec:kg']}); $G_2$ and the unified dataset jointly generate the 34K KG-grounded finetuning dataset (§\ref{['sec:ft_data']}), which trains the LoRA planner (§\ref{['sec:finetune']}); at inference the planner receives the query and KG context and routes to one of three specialised agents (§\ref{['sec:agents']}). The dashed feedback arrow models multi-turn state update (Eq. \ref{['eq:state_update']}): a resolved variable from the Ask agent transitions from $V_{\mathrm{missing}}$ to $V_{\mathrm{known}}$ before the next planner call.
  • Figure 2: Overview of the PassiveQA pipeline. After hybrid retrieval and evidence scoring, either the hard gate (Architecture 3) or the finetuned planner routes the query to one of three specialised agents. Dashed arrows indicate the planner path used in the full three-agent architecture.
  • Figure 3: Three-phase construction of the PassiveQA knowledge graph. Phase 1 ($G_0$): dependency parsing extracts SVO triples, including low-confidence edges (e.g., pronoun-linked relations). Phase 2 ($G_1$): semantic filtering using SBERT cosine similarity removes edges below a threshold ($\tau = 0.50$), pruning noisy connections. Phase 3 ($G_2$): graph refinement via action-based reinforcement, where Answer signals strengthen edges and Abstain signals penalize them; additionally, variable nodes are introduced via requires relations to capture missing information.
  • Figure 4: Three-agent architecture. The finetuned planner parses the <decision> tag and delegates execution to one of three specialised agents. The Ask agent reuses the <clarification_question> tag directly.