DoctorAgent-RL: A Multi-Agent Collaborative Reinforcement Learning System for Multi-Turn Clinical Dialogue
Yichun Feng, Jiawei Wang, Lu Zhou, Zhen Lei, Yixue Li
TL;DR
DoctorAgent-RL tackles the mismatch between real-world multi-turn clinical consultations and static, single-path LLM dialogue by modeling medical dialogues as a dynamic multi-agent reinforcement learning problem. The framework couples a doctor agent, a patient agent, and a multi-dimensional Consultation Evaluator, and is trained in two stages (SFT and RL) with a GRPO-based optimization, guided by the MTMedDialog dataset. Key contributions include the first English multi-turn medical dialogue dataset with hidden patient profiles, state-of-the-art performance in multi-turn reasoning and diagnostic accuracy, and a demonstrated real-world evaluation showing robust clinical reasoning and efficient information gathering. The approach advances practical clinical decision support by enabling adaptive, instruction-following, and resource-conscious doctor-patient interactions, with potential to reduce misdiagnosis risk and alleviate clinician workload in time-pressured settings.
Abstract
Large language models (LLMs) have demonstrated excellent capabilities in the field of biomedical question answering, but their application in real-world clinical consultations still faces core challenges. Single-round consultation systems require patients to describe all symptoms upfront, leading to vague diagnosis with unclear complaints. Traditional multi-turn dialogue models, constrained by static supervised learning, lack flexibility and fail to intelligently extract key clinical information. To address these limitations, we propose \Ours{}, a reinforcement learning (RL)-based multi-agent collaborative framework that models medical consultations as a dynamic decision-making process under uncertainty. The doctor agent continuously optimizes its questioning strategy within the RL framework through multi-turn interactions with the patient agent, dynamically adjusting its information-gathering path based on comprehensive rewards from the Consultation Evaluator. This RL fine-tuning mechanism enables LLMs to autonomously develop interaction strategies aligned with clinical reasoning logic, rather than superficially imitating patterns in existing dialogue data. Notably, we constructed MTMedDialog, the first English multi-turn medical consultation dataset capable of simulating patient interactions. Experiments demonstrate that \Ours{} outperforms existing models in both multi-turn reasoning capability and final diagnostic performance. This approach shows immense practical value by reducing misdiagnosis risks in time-pressured settings, freeing clinicians for complex cases, and pioneering a strategy to optimize medical resource allocation and alleviate workforce shortages. Code and data are available at https://github.com/JarvisUSTC/DoctorAgent-RL
