Table of Contents
Fetching ...

Learning to Seek Evidence: A Verifiable Reasoning Agent with Causal Faithfulness Analysis

Yuhang Huang, Zekai Lin, Fan Zhong, Lei Liu

TL;DR

This work tackles the lack of verifiable explanations in AI for high-stakes domains by proposing an interactive reasoning agent that externalizes its diagnostic process as auditable actions. The agent maintains beliefs in a Hypothesis Box and grounds hypotheses with external, calibrated visual evidence via a Probe & Ground action (P&G) using the KBCS tool, producing a transparent reasoning trace. The policy is aligned through a lightweight reinforcement learning objective (CISPO-style) to reward evidence-grounding steps, achieving substantial gains in calibrated accuracy (Brier score reductions) and higher evidence adoption without sacrificing efficiency. A rigorous faithfulness protocol combines occlusion-based tests and agent-level causal interventions, showing that masking the agent-adopted ROI degrades performance, thereby validating the causal role of the grounded evidence; the framework generalizes to new domains with minimal test-time calibration and commodity-hardware training feasibility.

Abstract

Explanations for AI models in high-stakes domains like medicine often lack verifiability, which can hinder trust. To address this, we propose an interactive agent that produces explanations through an auditable sequence of actions. The agent learns a policy to strategically seek external visual evidence to support its diagnostic reasoning. This policy is optimized using reinforcement learning, resulting in a model that is both efficient and generalizable. Our experiments show that this action-based reasoning process significantly improves calibrated accuracy, reducing the Brier score by 18\% compared to a non-interactive baseline. To validate the faithfulness of the agent's explanations, we introduce a causal intervention method. By masking the visual evidence the agent chooses to use, we observe a measurable degradation in its performance ($Δ$Brier=+0.029), confirming that the evidence is integral to its decision-making process. Our work provides a practical framework for building AI systems with verifiable and faithful reasoning capabilities.

Learning to Seek Evidence: A Verifiable Reasoning Agent with Causal Faithfulness Analysis

TL;DR

This work tackles the lack of verifiable explanations in AI for high-stakes domains by proposing an interactive reasoning agent that externalizes its diagnostic process as auditable actions. The agent maintains beliefs in a Hypothesis Box and grounds hypotheses with external, calibrated visual evidence via a Probe & Ground action (P&G) using the KBCS tool, producing a transparent reasoning trace. The policy is aligned through a lightweight reinforcement learning objective (CISPO-style) to reward evidence-grounding steps, achieving substantial gains in calibrated accuracy (Brier score reductions) and higher evidence adoption without sacrificing efficiency. A rigorous faithfulness protocol combines occlusion-based tests and agent-level causal interventions, showing that masking the agent-adopted ROI degrades performance, thereby validating the causal role of the grounded evidence; the framework generalizes to new domains with minimal test-time calibration and commodity-hardware training feasibility.

Abstract

Explanations for AI models in high-stakes domains like medicine often lack verifiability, which can hinder trust. To address this, we propose an interactive agent that produces explanations through an auditable sequence of actions. The agent learns a policy to strategically seek external visual evidence to support its diagnostic reasoning. This policy is optimized using reinforcement learning, resulting in a model that is both efficient and generalizable. Our experiments show that this action-based reasoning process significantly improves calibrated accuracy, reducing the Brier score by 18\% compared to a non-interactive baseline. To validate the faithfulness of the agent's explanations, we introduce a causal intervention method. By masking the visual evidence the agent chooses to use, we observe a measurable degradation in its performance (Brier=+0.029), confirming that the evidence is integral to its decision-making process. Our work provides a practical framework for building AI systems with verifiable and faithful reasoning capabilities.

Paper Structure

This paper contains 35 sections, 2 equations, 3 figures, 6 tables, 2 algorithms.

Figures (3)

  • Figure 1: Overview of our verifiable reasoning framework. The agent iteratively refines its belief within a Hypothesis Box (H-Box) by executing a --P&G) action. This action invokes an external tool (KBCS) to ground the hypothesis in visual evidence (an ROI and a score), creating an auditable, step-by-step diagnostic trace.
  • Figure 2: Bubble reliability diagrams for the Initial (left) and RL-aligned (right) policies, compared against the noP&G baseline (gray). Each bubble's position plots empirical Accuracy against model Confidence, while its size reflects the number of samples in its bin. The baseline's large, low-confidence bubbles show poor calibration. Our agent (blue and green) progressively aligns its bubbles with the ideal diagonal (dashed line) and distributes them more broadly, with the RL-aligned policy (green) achieving the best calibration.
  • Figure 3: Effect of gating threshold $\tau$ on adoption rate and performance for the uncalibrated KBCS-Gate variant. While the adoption rate is controllable, performance remains stagnant, highlighting the importance of evidence quality over quantity.