Conversational Disease Diagnosis via External Planner-Controlled Large Language Models
Zhoujian Sun, Cheng Luo, Ziyi Liu, Zhengxing Huang
TL;DR
This paper addresses proactive data collection for AI-based medical diagnosis by introducing two external planners that guide an LLM: a disease screening planner trained with reinforcement learning to elicit pertinent symptoms, and a differential diagnosis planner that converts medical guidelines into structured decision procedures. Evaluated on EMR-derived simulated dialogues from the MIMIC-IV dataset, the approach demonstrates that external planners improve screening accuracy and that guideline-driven procedures, when combined with external knowledge and human refinement, yield strong differential-diagnosis performance. A key innovation is the auto-generated, interpretable decision procedures for differential diagnosis, enabling reliability and auditability without mandatory expert-built rules. The framework leverages open-source LLMs and EMR data to enable cost-effective, scalable conversational diagnostics suitable for clinical integration, while highlighting current limitations and avenues for clinical validation.
Abstract
The development of large language models (LLMs) has brought unprecedented possibilities for artificial intelligence (AI) based medical diagnosis. However, the application perspective of LLMs in real diagnostic scenarios is still unclear because they are not adept at collecting patient data proactively. This study presents a LLM-based diagnostic system that enhances planning capabilities by emulating doctors. Our system involves two external planners to handle planning tasks. The first planner employs a reinforcement learning approach to formulate disease screening questions and conduct initial diagnoses. The second planner uses LLMs to parse medical guidelines and conduct differential diagnoses. By utilizing real patient electronic medical record data, we constructed simulated dialogues between virtual patients and doctors and evaluated the diagnostic abilities of our system. We demonstrated that our system obtained impressive performance in both disease screening and differential diagnoses tasks. This research represents a step towards more seamlessly integrating AI into clinical settings, potentially enhancing the accuracy and accessibility of medical diagnostics.
