Table of Contents
Fetching ...

DoctorAgent-RL: A Multi-Agent Collaborative Reinforcement Learning System for Multi-Turn Clinical Dialogue

Yichun Feng, Jiawei Wang, Lu Zhou, Zhen Lei, Yixue Li

TL;DR

DoctorAgent-RL tackles the mismatch between real-world multi-turn clinical consultations and static, single-path LLM dialogue by modeling medical dialogues as a dynamic multi-agent reinforcement learning problem. The framework couples a doctor agent, a patient agent, and a multi-dimensional Consultation Evaluator, and is trained in two stages (SFT and RL) with a GRPO-based optimization, guided by the MTMedDialog dataset. Key contributions include the first English multi-turn medical dialogue dataset with hidden patient profiles, state-of-the-art performance in multi-turn reasoning and diagnostic accuracy, and a demonstrated real-world evaluation showing robust clinical reasoning and efficient information gathering. The approach advances practical clinical decision support by enabling adaptive, instruction-following, and resource-conscious doctor-patient interactions, with potential to reduce misdiagnosis risk and alleviate clinician workload in time-pressured settings.

Abstract

Large language models (LLMs) have demonstrated excellent capabilities in the field of biomedical question answering, but their application in real-world clinical consultations still faces core challenges. Single-round consultation systems require patients to describe all symptoms upfront, leading to vague diagnosis with unclear complaints. Traditional multi-turn dialogue models, constrained by static supervised learning, lack flexibility and fail to intelligently extract key clinical information. To address these limitations, we propose \Ours{}, a reinforcement learning (RL)-based multi-agent collaborative framework that models medical consultations as a dynamic decision-making process under uncertainty. The doctor agent continuously optimizes its questioning strategy within the RL framework through multi-turn interactions with the patient agent, dynamically adjusting its information-gathering path based on comprehensive rewards from the Consultation Evaluator. This RL fine-tuning mechanism enables LLMs to autonomously develop interaction strategies aligned with clinical reasoning logic, rather than superficially imitating patterns in existing dialogue data. Notably, we constructed MTMedDialog, the first English multi-turn medical consultation dataset capable of simulating patient interactions. Experiments demonstrate that \Ours{} outperforms existing models in both multi-turn reasoning capability and final diagnostic performance. This approach shows immense practical value by reducing misdiagnosis risks in time-pressured settings, freeing clinicians for complex cases, and pioneering a strategy to optimize medical resource allocation and alleviate workforce shortages. Code and data are available at https://github.com/JarvisUSTC/DoctorAgent-RL

DoctorAgent-RL: A Multi-Agent Collaborative Reinforcement Learning System for Multi-Turn Clinical Dialogue

TL;DR

DoctorAgent-RL tackles the mismatch between real-world multi-turn clinical consultations and static, single-path LLM dialogue by modeling medical dialogues as a dynamic multi-agent reinforcement learning problem. The framework couples a doctor agent, a patient agent, and a multi-dimensional Consultation Evaluator, and is trained in two stages (SFT and RL) with a GRPO-based optimization, guided by the MTMedDialog dataset. Key contributions include the first English multi-turn medical dialogue dataset with hidden patient profiles, state-of-the-art performance in multi-turn reasoning and diagnostic accuracy, and a demonstrated real-world evaluation showing robust clinical reasoning and efficient information gathering. The approach advances practical clinical decision support by enabling adaptive, instruction-following, and resource-conscious doctor-patient interactions, with potential to reduce misdiagnosis risk and alleviate clinician workload in time-pressured settings.

Abstract

Large language models (LLMs) have demonstrated excellent capabilities in the field of biomedical question answering, but their application in real-world clinical consultations still faces core challenges. Single-round consultation systems require patients to describe all symptoms upfront, leading to vague diagnosis with unclear complaints. Traditional multi-turn dialogue models, constrained by static supervised learning, lack flexibility and fail to intelligently extract key clinical information. To address these limitations, we propose \Ours{}, a reinforcement learning (RL)-based multi-agent collaborative framework that models medical consultations as a dynamic decision-making process under uncertainty. The doctor agent continuously optimizes its questioning strategy within the RL framework through multi-turn interactions with the patient agent, dynamically adjusting its information-gathering path based on comprehensive rewards from the Consultation Evaluator. This RL fine-tuning mechanism enables LLMs to autonomously develop interaction strategies aligned with clinical reasoning logic, rather than superficially imitating patterns in existing dialogue data. Notably, we constructed MTMedDialog, the first English multi-turn medical consultation dataset capable of simulating patient interactions. Experiments demonstrate that \Ours{} outperforms existing models in both multi-turn reasoning capability and final diagnostic performance. This approach shows immense practical value by reducing misdiagnosis risks in time-pressured settings, freeing clinicians for complex cases, and pioneering a strategy to optimize medical resource allocation and alleviate workforce shortages. Code and data are available at https://github.com/JarvisUSTC/DoctorAgent-RL

Paper Structure

This paper contains 31 sections, 8 equations, 14 figures, 4 tables.

Figures (14)

  • Figure 1: Overview of this work. DoctorAgent-RL achieves state-of-the-art on the MTMedDialog task.
  • Figure 2: The multi-agent collaborative reinforcement learning framework for DoctorAgent-RL. During the rollout stage, multi-turn interactions are conducted between the doctor agent and the patient agent.
  • Figure 3: Performance comparison of different models in simulating patient agent on MTMedDialog.
  • Figure 4: (a) Main results by disease category on MTMedDialog dataset, showing the average of diagnostic accuracy and recommendation accuracy scores. Dark blue boxes represent Frontier Models, light blue boxes represent Open-Source Base Models, and dark gray boxes represent Domain-Specific Models. (b) Performance comparison of different fine-tuning methods for Qwen2.5-7B-Instruct on MTMedDialog. Avg. Turns indicates the average number of interaction turns across all disease categories.
  • Figure 5: Performance of DoctorAgent-RL in real-world interactive scenarios with actual patients.
  • ...and 9 more figures