Leveraging Large Language Model as Simulated Patients for Clinical Education
Yanzeng Li, Cheng Zeng, Jialun Zhong, Ruoyu Zhang, Minhao Zhang, Lei Zou
TL;DR
The paper tackles the bottleneck of traditional SP-based clinical training by proposing CureFun, a model-agnostic framework that uses LLMs to simulate patient encounters in an education setting. It combines a graph-driven context-adaptive SP chatbot (ERRG) with retrieval-augmented generation over a case graph and an automated, ensemble-based assessment module to standardize dialogue and feedback. Empirical results on eight Chinese SP cases show CureFun produces more authentic SP dialogue flows than baseline LLM chatbots and yields automated scores that strongly align with human grading (mean correlations around 0.81–0.85, p<0.05). The study also evaluates LLMs as virtual doctors, finding that while top models approach human performance in conversational aspects, human clinicians still outperform in diagnostic accuracy, underscoring the need for integrated VSP-VD training for scalable clinical education.
Abstract
Simulated Patients (SPs) play a crucial role in clinical medical education by providing realistic scenarios for student practice. However, the high cost of training and hiring qualified SPs, along with the heavy workload and potential risks they face in consistently portraying actual patients, limit students' access to this type of clinical training. Consequently, the integration of computer program-based simulated patients has emerged as a valuable educational tool in recent years. With the rapid development of Large Language Models (LLMs), their exceptional capabilities in conversational artificial intelligence and role-playing have been demonstrated, making them a feasible option for implementing Virtual Simulated Patient (VSP). In this paper, we present an integrated model-agnostic framework called CureFun that harnesses the potential of LLMs in clinical medical education. This framework facilitates natural conversations between students and simulated patients, evaluates their dialogue, and provides suggestions to enhance students' clinical inquiry skills. Through comprehensive evaluations, our approach demonstrates more authentic and professional SP-scenario dialogue flows compared to other LLM-based chatbots, thus proving its proficiency in simulating patients. Additionally, leveraging CureFun's evaluation ability, we assess several medical LLMs and discuss the possibilities and limitations of using LLMs as virtual doctors from the perspective of their diagnostic abilities.
