Closing Reasoning Gaps in Clinical Agents with Differential Reasoning Learning

Jinsong Liu; Yuhang Jiang; Ramayya Krishnan; Rema Padman; Yiye Zhang; Jiang Bian

Closing Reasoning Gaps in Clinical Agents with Differential Reasoning Learning

Jinsong Liu, Yuhang Jiang, Ramayya Krishnan, Rema Padman, Yiye Zhang, Jiang Bian

TL;DR

This work tackles the problem that clinical decision support systems must not only produce correct answers but also clinically valid reasoning. It introduces Differential Reasoning Learning (DRL), which represents reasoning as directed acyclic graphs and learns from reasoning discrepancies via a clinically weighted graph edit distance (GED) between agent and reference graphs, guided by an LLM as a judge. Discrepancies are distilled into reusable, natural-language instructions stored in a Differential Reasoning Knowledge Base (DR-KB) and retrieved at inference time to patch likely reasoning gaps through retrieval-augmented generation, without parameter updates. Empirical results on open medical QA benchmarks (MedQA, MedMCQA) and an RVA-QA task show DRL achieving gains over baselines, with especially large improvements under domain shift, and ablations confirming the value of physician rationales and the top-$k$ retrieval strategy; clinicians corroborate improved reasoning fidelity. The framework offers a practical, auditable path toward trustworthy reasoning-aligned AI in medicine and supports deployment under limited token budgets by avoiding costly re-training while enabling continuous knowledge accumulation.

Abstract

Clinical decision support requires not only correct answers but also clinically valid reasoning. We propose Differential Reasoning Learning (DRL), a framework that improves clinical agents by learning from reasoning discrepancies. From reference reasoning rationales (e.g., physician-authored clinical rationale, clinical guidelines, or outputs from more capable models) and the agent's free-form chain-of-thought (CoT), DRL extracts reasoning graphs as directed acyclic graphs (DAGs) and performs a clinically weighted graph edit distance (GED)-based discrepancy analysis. An LLM-as-a-judge aligns semantically equivalent nodes and diagnoses discrepancies between graphs. These graph-level discrepancy diagnostics are converted into natural-language instructions and stored in a Differential Reasoning Knowledge Base (DR-KB). At inference, we retrieve top-$k$ instructions via Retrieval-Augmented Generation (RAG) to augment the agent prompt and patch likely logic gaps. Evaluation on open medical question answering (QA) benchmarks and a Return Visit Admissions (RVA) prediction task from internal clinical data demonstrates gains over baselines, improving both final-answer accuracy and reasoning fidelity. Ablation studies confirm gains from infusing reference reasoning rationales and the top-$k$ retrieval strategy. Clinicians' review of the output provides further assurance of the approach. Together, results suggest that DRL supports more reliable clinical decision-making in complex reasoning scenarios and offers a practical mechanism for deployment under limited token budgets.

Closing Reasoning Gaps in Clinical Agents with Differential Reasoning Learning

TL;DR

retrieval strategy; clinicians corroborate improved reasoning fidelity. The framework offers a practical, auditable path toward trustworthy reasoning-aligned AI in medicine and supports deployment under limited token budgets by avoiding costly re-training while enabling continuous knowledge accumulation.

Abstract

instructions via Retrieval-Augmented Generation (RAG) to augment the agent prompt and patch likely logic gaps. Evaluation on open medical question answering (QA) benchmarks and a Return Visit Admissions (RVA) prediction task from internal clinical data demonstrates gains over baselines, improving both final-answer accuracy and reasoning fidelity. Ablation studies confirm gains from infusing reference reasoning rationales and the top-

retrieval strategy. Clinicians' review of the output provides further assurance of the approach. Together, results suggest that DRL supports more reliable clinical decision-making in complex reasoning scenarios and offers a practical mechanism for deployment under limited token budgets.

Paper Structure (30 sections, 2 equations, 2 figures, 3 tables, 1 algorithm)

This paper contains 30 sections, 2 equations, 2 figures, 3 tables, 1 algorithm.

Introduction
Related Work
Structured Representations of Clinical Reasoning
Process-Level Reasoning Alignment and Supervision
Agentic Memory and Error-Driven Self-Improvement
Methodology
Phase 1: Reasoning Graph Extraction
Phase 2: Measuring the Reasoning Gap
Phase 3: Differential Reasoning Knowledge Base
Phase 4: Differential Knowledge-Augmented Inference
Experiments
Experimental Setup
Datasets
Open Medical QA Datasets
Return Visit Admission Prediction
...and 15 more sections

Figures (2)

Figure 1: Pipeline for Differential Reasoning Learning (DRL), including Differential Knowledge Mining and Differential Knowledge-Augmented Inference.
Figure 2: Performance gain of DRL over baselines across datasets.

Closing Reasoning Gaps in Clinical Agents with Differential Reasoning Learning

TL;DR

Abstract

Closing Reasoning Gaps in Clinical Agents with Differential Reasoning Learning

Authors

TL;DR

Abstract

Table of Contents

Figures (2)