PatientSim: A Persona-Driven Simulator for Realistic Doctor-Patient Interactions
Daeun Kyung, Hyunseung Chung, Seongsu Bae, Jiho Kim, Jae Ho Sohn, Taerim Kim, Soo Kyung Kim, Edward Choi
TL;DR
PatientSim presents a persona-driven, open-source simulator for realistic doctor–patient dialogues in emergency care, built from MIMIC-derived profiles (170) and 37 personas across four axes. It evaluates eight LLMs, identifying Llama 3.3 70B as the strongest open-source backbone, validated by clinicians, and demonstrates strong fidelity, factuality, and plausibility across diverse presentations. The framework enables reproducible research with privacy-preserving data and holds promise for education and evaluation of medical dialogue systems. Limitations include reliance on a single dataset, absence of nonverbal cues, and a small human evaluator pool, suggesting multimodal extensions and broader validation as future work.
Abstract
Doctor-patient consultations require multi-turn, context-aware communication tailored to diverse patient personas. Training or evaluating doctor LLMs in such settings requires realistic patient interaction systems. However, existing simulators often fail to reflect the full range of personas seen in clinical practice. To address this, we introduce PatientSim, a patient simulator that generates realistic and diverse patient personas for clinical scenarios, grounded in medical expertise. PatientSim operates using: 1) clinical profiles, including symptoms and medical history, derived from real-world data in the MIMIC-ED and MIMIC-IV datasets, and 2) personas defined by four axes: personality, language proficiency, medical history recall level, and cognitive confusion level, resulting in 37 unique combinations. We evaluate eight LLMs for factual accuracy and persona consistency. The top-performing open-source model, Llama 3.3 70B, is validated by four clinicians to confirm the robustness of our framework. As an open-source, customizable platform, PatientSim provides a reproducible and scalable solution that can be customized for specific training needs. Offering a privacy-compliant environment, it serves as a robust testbed for evaluating medical dialogue systems across diverse patient presentations and shows promise as an educational tool for healthcare. The code is available at https://github.com/dek924/PatientSim.
