Closing Reasoning Gaps in Clinical Agents with Differential Reasoning Learning
Jinsong Liu, Yuhang Jiang, Ramayya Krishnan, Rema Padman, Yiye Zhang, Jiang Bian
TL;DR
This work tackles the problem that clinical decision support systems must not only produce correct answers but also clinically valid reasoning. It introduces Differential Reasoning Learning (DRL), which represents reasoning as directed acyclic graphs and learns from reasoning discrepancies via a clinically weighted graph edit distance (GED) between agent and reference graphs, guided by an LLM as a judge. Discrepancies are distilled into reusable, natural-language instructions stored in a Differential Reasoning Knowledge Base (DR-KB) and retrieved at inference time to patch likely reasoning gaps through retrieval-augmented generation, without parameter updates. Empirical results on open medical QA benchmarks (MedQA, MedMCQA) and an RVA-QA task show DRL achieving gains over baselines, with especially large improvements under domain shift, and ablations confirming the value of physician rationales and the top-$k$ retrieval strategy; clinicians corroborate improved reasoning fidelity. The framework offers a practical, auditable path toward trustworthy reasoning-aligned AI in medicine and supports deployment under limited token budgets by avoiding costly re-training while enabling continuous knowledge accumulation.
Abstract
Clinical decision support requires not only correct answers but also clinically valid reasoning. We propose Differential Reasoning Learning (DRL), a framework that improves clinical agents by learning from reasoning discrepancies. From reference reasoning rationales (e.g., physician-authored clinical rationale, clinical guidelines, or outputs from more capable models) and the agent's free-form chain-of-thought (CoT), DRL extracts reasoning graphs as directed acyclic graphs (DAGs) and performs a clinically weighted graph edit distance (GED)-based discrepancy analysis. An LLM-as-a-judge aligns semantically equivalent nodes and diagnoses discrepancies between graphs. These graph-level discrepancy diagnostics are converted into natural-language instructions and stored in a Differential Reasoning Knowledge Base (DR-KB). At inference, we retrieve top-$k$ instructions via Retrieval-Augmented Generation (RAG) to augment the agent prompt and patch likely logic gaps. Evaluation on open medical question answering (QA) benchmarks and a Return Visit Admissions (RVA) prediction task from internal clinical data demonstrates gains over baselines, improving both final-answer accuracy and reasoning fidelity. Ablation studies confirm gains from infusing reference reasoning rationales and the top-$k$ retrieval strategy. Clinicians' review of the output provides further assurance of the approach. Together, results suggest that DRL supports more reliable clinical decision-making in complex reasoning scenarios and offers a practical mechanism for deployment under limited token budgets.
