Synthetic Patients: Simulating Difficult Conversations with Multimodal Generative AI for Medical Education

Simon N. Chu; Alex J. Goodell

Synthetic Patients: Simulating Difficult Conversations with Multimodal Generative AI for Medical Education

Simon N. Chu, Alex J. Goodell

TL;DR

This work tackles the challenge of training medical professionals in difficult conversations, particularly goals-of-care discussions, by introducing synthetic patients—multimodal AI avatars capable of real-time, video-based interactions. The authors bundle GPT-4-powered patient profiles with multimodal image, voice, and video generation and a custom telehealth interface to deliver high-fidelity, diverse simulations at relatively low direct cost (~$150 upfront; $500–$2000 monthly hosting). They report positive fidelity but acknowledge substantial challenges, including artifacts, bias, latency, and the need for rigorousEducational impact evaluation. The platform offers a scalable alternative to traditional standardized patients and could be integrated into palliative care curricula or used for just-in-time training, while outlining clear next steps to enhance realism, feedback, and educational validation.

Abstract

Problem: Effective patient-centered communication is a core competency for physicians. However, both seasoned providers and medical trainees report decreased confidence in leading conversations on sensitive topics such as goals of care or end-of-life discussions. The significant administrative burden and the resources required to provide dedicated training in leading difficult conversations has been a long-standing problem in medical education. Approach: In this work, we present a novel educational tool designed to facilitate interactive, real-time simulations of difficult conversations in a video-based format through the use of multimodal generative artificial intelligence (AI). Leveraging recent advances in language modeling, computer vision, and generative audio, this tool creates realistic, interactive scenarios with avatars, or "synthetic patients." These synthetic patients interact with users throughout various stages of medical care using a custom-built video chat application, offering learners the chance to practice conversations with patients from diverse belief systems, personalities, and ethnic backgrounds. Outcomes: While the development of this platform demanded substantial upfront investment in labor, it offers a highly-realistic simulation experience with minimal financial investment. For medical trainees, this educational tool can be implemented within programs to simulate patient-provider conversations and can be incorporated into existing palliative care curriculum to provide a scalable, high-fidelity simulation environment for mastering difficult conversations. Next Steps: Future developments will explore enhancing the authenticity of these encounters by working with patients to incorporate their histories and personalities, as well as employing the use of AI-generated evaluations to offer immediate, constructive feedback to learners post-simulation.

Synthetic Patients: Simulating Difficult Conversations with Multimodal Generative AI for Medical Education

TL;DR

500–$2000 monthly hosting). They report positive fidelity but acknowledge substantial challenges, including artifacts, bias, latency, and the need for rigorousEducational impact evaluation. The platform offers a scalable alternative to traditional standardized patients and could be integrated into palliative care curricula or used for just-in-time training, while outlining clear next steps to enhance realism, feedback, and educational validation.

Abstract

Paper Structure (11 sections, 3 figures)

This paper contains 11 sections, 3 figures.

Problem
Approach
Construction of patient profiles
Generation of patient multimedia
Integration with a custom video chat application
Outcomes
Fidelity
Cost
Challenges
Next Steps
Acknowledgements, Funding, Data

Figures (3)

Figure 1: Overview of medical training modalities for teaching difficult conversations. Schematic illustrating current medical training modalities utilized for teaching difficult conversation skills, positioned according to their fidelity and resource requirements. High fidelity but high resource requirement modalities include in-person standardized patients and video-based standardized patients. Low fidelity and low resource demanding methods include lectures and problem-based learning cases. AI-enabled synthetic patients deliver a high-fidelity experience with relatively lower resource requirements, potentially offering an optimal balance for effective training with minimal implementation resources.
Figure 2: Approach — Overview of synthetic patient generation workflow and user experience on web-based platform.A. To construct of patient profiles, a standardized set of role-playing instructions was given to the language model. Each chatbot received individual patient profiles detailing characteristics like disease onset, healthcare experience, and belief system. Responses were then validated to ensure they aligned with the synthetic patient’s defined attributes and tone. B. Schematic showing how the various tools, including image editors, voice cloning software, and text-to-video generators, were used to create image, audio, and video multimedia, crafting a realistic patient avatar that aligned with the synthetic patient profile. Lines represent the flow of data as input and/or outputs to various tools. C. Schematic showing the flow of the user experience in the web-based platform (dotted lines) and data (solid lines) to generate the media sent to the user. This system allows users to voice questions and receive synchronized audiovisual responses.
Figure 3: Outcome — Successes and challenges of synthetic patient development.A. Visual depiction of conversation taken from text-only chatbot. Though occasionally veered off topic, synthetic patients generally offered realistic responses and thoughtful questions. B. Voices for synthetic patients were developed by collecting royalty-free speech clips, refining them with audio processing software, and cloning them using a voice cloning tool. C. Photos of the synthetic patients were generated using multiple imaging models. Encountered problems included overly-stereotypical appearance (images I, II), duplicated elements (III, IV) and poor understanding of desired perspective (V). D. To generate patient response videos, initial images were processed through video-generation tools to simulate realistic body and head movements. However, these tools introduced hallucinated movements (image I, IV) or significantly distorted the facial features (II, III). E. To enable real-time interaction with synthetic patients, a simple web application was developed which features a mock telehealth interface with a video feed of the patient awaiting a question. The video is comprised of a mix of the above tools and lip-syncing software to provide a real-time realistic telehealth simulation.

Synthetic Patients: Simulating Difficult Conversations with Multimodal Generative AI for Medical Education

TL;DR

Abstract

Synthetic Patients: Simulating Difficult Conversations with Multimodal Generative AI for Medical Education

Authors

TL;DR

Abstract

Table of Contents

Figures (3)