CoD, Towards an Interpretable Medical Agent using Chain of Diagnosis
Junying Chen, Chi Gui, Anningzhe Gao, Ke Ji, Xidong Wang, Xiang Wan, Benyou Wang
TL;DR
<3-5 sentence high-level summary> The paper tackles interpretability in LLM-based medical diagnosis by introducing Chain of Diagnosis (CoD), which outputs a transparent diagnostic chain and a disease confidence distribution to enable entropy-based symptom inquiry and controllable decisions. It builds DiagnosisGPT by fine-tuning on 48,020 synthetic CoD cases generated from a 9,604-disease knowledge base, achieving diagnosis across 9,604 diseases and demonstrating strong performance and interpretability on multiple benchmarks, including the new DxBench real-world dataset. The approach combines a disease retriever, confidence-driven decision making, and entropy-guided inquiries to improve both transparency and diagnostic rigor. The work provides a scalable framework for evaluating medical LLMs with open-ended consultations and offers a practical benchmark (DxBench) to simulate real-world clinical diagnostics.
Abstract
The field of medical diagnosis has undergone a significant transformation with the advent of large language models (LLMs), yet the challenges of interpretability within these models remain largely unaddressed. This study introduces Chain-of-Diagnosis (CoD) to enhance the interpretability of LLM-based medical diagnostics. CoD transforms the diagnostic process into a diagnostic chain that mirrors a physician's thought process, providing a transparent reasoning pathway. Additionally, CoD outputs the disease confidence distribution to ensure transparency in decision-making. This interpretability makes model diagnostics controllable and aids in identifying critical symptoms for inquiry through the entropy reduction of confidences. With CoD, we developed DiagnosisGPT, capable of diagnosing 9604 diseases. Experimental results demonstrate that DiagnosisGPT outperforms other LLMs on diagnostic benchmarks. Moreover, DiagnosisGPT provides interpretability while ensuring controllability in diagnostic rigor.
