Table of Contents
Fetching ...

MedTutor-R1: Socratic Personalized Medical Teaching with Multi-Agent Simulation

Zhitao He, Haolin Yang, Zeyu Qin, Yi R Fung

TL;DR

The paper tackles the shortage of expert clinical teaching by introducing ClinEdu, a high-fidelity multi-agent simulator, and ClinTeach, a large-scale Socratic teaching dataset used to train MedTutor-R1, the first multimodal tutor for one-to-many clinical instruction. MedTutor-R1 is initially supervised with ClinTeach and then refined via reinforcement learning using a three-axis rubric (structural fidelity, analytical quality, clinical safety) to optimize adaptive Socratic strategies. Its performance is evaluated through simulation-based in-situ testing within ClinEdu, showing substantial improvements over a base model and competitive parity with strong baselines, with demonstrated robustness across varying class sizes. The work offers a scalable, data-driven framework for enhancing group-based medical education, potentially broadening access to high-quality clinical training while maintaining safety and pedagogical rigor.

Abstract

The significant gap between rising demands for clinical training and the scarcity of expert instruction poses a major challenge to medical education. With powerful capabilities in personalized guidance, Large Language Models (LLMs) offer a promising solution to bridge this gap. However, current research focuses mainly on one-on-one knowledge instruction, overlooking collaborative reasoning, a key skill for students developed in teamwork like ward rounds. To this end, we develop ClinEdu, a multi-agent pedagogical simulator with personality-driven patients and diverse student cohorts, enabling controlled testing of complex pedagogical processes and scalable generation of teaching data. Based on ClinEdu, we construct ClinTeach, a large Socratic teaching dialogue dataset that captures the complexities of group instruction. We then train MedTutor-R1, the first multimodal Socratic tutor designed for one-to-many instruction in clinical medical education. MedTutor-R1 is first instruction-tuned on our ClinTeach dataset and then optimized with reinforcement learning, using rewards derived from a three-axis rubric, covering structural fidelity, analytical quality, and clinical safety, to refine its adaptive Socratic strategies. For authentic in-situ assessment, we use simulation-based interactive evaluation that redeploys the tutor back into ClinEdu. Experimental results demonstrate that our MedTutor-R1 outperforms the base model by over 20% in average pedagogical score and is comparable to o3, while also exhibiting high adaptability in handling a varying number of students. This promising performance underscores the effectiveness of our pedagogical simulator, ClinEdu.

MedTutor-R1: Socratic Personalized Medical Teaching with Multi-Agent Simulation

TL;DR

The paper tackles the shortage of expert clinical teaching by introducing ClinEdu, a high-fidelity multi-agent simulator, and ClinTeach, a large-scale Socratic teaching dataset used to train MedTutor-R1, the first multimodal tutor for one-to-many clinical instruction. MedTutor-R1 is initially supervised with ClinTeach and then refined via reinforcement learning using a three-axis rubric (structural fidelity, analytical quality, clinical safety) to optimize adaptive Socratic strategies. Its performance is evaluated through simulation-based in-situ testing within ClinEdu, showing substantial improvements over a base model and competitive parity with strong baselines, with demonstrated robustness across varying class sizes. The work offers a scalable, data-driven framework for enhancing group-based medical education, potentially broadening access to high-quality clinical training while maintaining safety and pedagogical rigor.

Abstract

The significant gap between rising demands for clinical training and the scarcity of expert instruction poses a major challenge to medical education. With powerful capabilities in personalized guidance, Large Language Models (LLMs) offer a promising solution to bridge this gap. However, current research focuses mainly on one-on-one knowledge instruction, overlooking collaborative reasoning, a key skill for students developed in teamwork like ward rounds. To this end, we develop ClinEdu, a multi-agent pedagogical simulator with personality-driven patients and diverse student cohorts, enabling controlled testing of complex pedagogical processes and scalable generation of teaching data. Based on ClinEdu, we construct ClinTeach, a large Socratic teaching dialogue dataset that captures the complexities of group instruction. We then train MedTutor-R1, the first multimodal Socratic tutor designed for one-to-many instruction in clinical medical education. MedTutor-R1 is first instruction-tuned on our ClinTeach dataset and then optimized with reinforcement learning, using rewards derived from a three-axis rubric, covering structural fidelity, analytical quality, and clinical safety, to refine its adaptive Socratic strategies. For authentic in-situ assessment, we use simulation-based interactive evaluation that redeploys the tutor back into ClinEdu. Experimental results demonstrate that our MedTutor-R1 outperforms the base model by over 20% in average pedagogical score and is comparable to o3, while also exhibiting high adaptability in handling a varying number of students. This promising performance underscores the effectiveness of our pedagogical simulator, ClinEdu.

Paper Structure

This paper contains 28 sections, 3 equations, 28 figures, 11 tables, 1 algorithm.

Figures (28)

  • Figure 1: Our ClinEdu framework for clinical ward rounds simulation. The system first samples a case from the original dataset, which is then decomposed and used to create a patient script. Based on this script, a suitable patient prototype is selected from the patient database. A team of students with diverse backgrounds is then randomly assembled. The simulation comprises student analysis, tutor guidance and review, and student query and exploration.
  • Figure 2: Distribution of Simulated Student and Patient Personas.
  • Figure 3: Analysis of model robustness and adaptability across various student agents.
  • Figure 4: Evaluating model performance with an increasing number of students.
  • Figure 5: Scoring consistency between human experts and LLM on the ETS Metric and Real User Study.
  • ...and 23 more figures