Table of Contents
Fetching ...

MedDiT: A Knowledge-Controlled Diffusion Transformer Framework for Dynamic Medical Image Generation in Virtual Simulated Patient

Yanzeng Li, Cheng Zeng, Jinchao Zhang, Jie Zhou, Lei Zou

TL;DR

MedDiT tackles the cost and data-diversity challenges in simulated patient-based medical education by integrating knowledge graphs with LLMs and diffusion-based image synthesis. It defines a knowledge graph framework $G=(E,R,T)$ and knowledge triples $⟨s,p,o⟩$, retrieves a subgraph $G'$, converts it to text $IP=f(G')$, and maps it to an image $I=DiT(IP)$ via a mapping $g: Text \rightarrow I$. The system comprises three agents—KG agent, chat agent, and image-generation agent—to dynamically generate symptom-aligned medical imagery and coherent dialogue. Demonstrations show MedDiT can create diverse, immersive training scenarios and provide evaluative feedback, underscoring its potential to scale and enhance medical education.

Abstract

Medical education relies heavily on Simulated Patients (SPs) to provide a safe environment for students to practice clinical skills, including medical image analysis. However, the high cost of recruiting qualified SPs and the lack of diverse medical imaging datasets have presented significant challenges. To address these issues, this paper introduces MedDiT, a novel knowledge-controlled conversational framework that can dynamically generate plausible medical images aligned with simulated patient symptoms, enabling diverse diagnostic skill training. Specifically, MedDiT integrates various patient Knowledge Graphs (KGs), which describe the attributes and symptoms of patients, to dynamically prompt Large Language Models' (LLMs) behavior and control the patient characteristics, mitigating hallucination during medical conversation. Additionally, a well-tuned Diffusion Transformer (DiT) model is incorporated to generate medical images according to the specified patient attributes in the KG. In this paper, we present the capabilities of MedDiT through a practical demonstration, showcasing its ability to act in diverse simulated patient cases and generate the corresponding medical images. This can provide an abundant and interactive learning experience for students, advancing medical education by offering an immersive simulation platform for future healthcare professionals. The work sheds light on the feasibility of incorporating advanced technologies like LLM, KG, and DiT in education applications, highlighting their potential to address the challenges faced in simulated patient-based medical education.

MedDiT: A Knowledge-Controlled Diffusion Transformer Framework for Dynamic Medical Image Generation in Virtual Simulated Patient

TL;DR

MedDiT tackles the cost and data-diversity challenges in simulated patient-based medical education by integrating knowledge graphs with LLMs and diffusion-based image synthesis. It defines a knowledge graph framework and knowledge triples , retrieves a subgraph , converts it to text , and maps it to an image via a mapping . The system comprises three agents—KG agent, chat agent, and image-generation agent—to dynamically generate symptom-aligned medical imagery and coherent dialogue. Demonstrations show MedDiT can create diverse, immersive training scenarios and provide evaluative feedback, underscoring its potential to scale and enhance medical education.

Abstract

Medical education relies heavily on Simulated Patients (SPs) to provide a safe environment for students to practice clinical skills, including medical image analysis. However, the high cost of recruiting qualified SPs and the lack of diverse medical imaging datasets have presented significant challenges. To address these issues, this paper introduces MedDiT, a novel knowledge-controlled conversational framework that can dynamically generate plausible medical images aligned with simulated patient symptoms, enabling diverse diagnostic skill training. Specifically, MedDiT integrates various patient Knowledge Graphs (KGs), which describe the attributes and symptoms of patients, to dynamically prompt Large Language Models' (LLMs) behavior and control the patient characteristics, mitigating hallucination during medical conversation. Additionally, a well-tuned Diffusion Transformer (DiT) model is incorporated to generate medical images according to the specified patient attributes in the KG. In this paper, we present the capabilities of MedDiT through a practical demonstration, showcasing its ability to act in diverse simulated patient cases and generate the corresponding medical images. This can provide an abundant and interactive learning experience for students, advancing medical education by offering an immersive simulation platform for future healthcare professionals. The work sheds light on the feasibility of incorporating advanced technologies like LLM, KG, and DiT in education applications, highlighting their potential to address the challenges faced in simulated patient-based medical education.
Paper Structure (9 sections, 4 equations, 3 figures, 1 table)

This paper contains 9 sections, 4 equations, 3 figures, 1 table.

Figures (3)

  • Figure 1: The overview diagram of MedDiT. There are 3 distinct LLM-based agents for controlling information flow across various modalities, including graph, text and image.
  • Figure 2: Screenshot of MedDiT. Left: Visualization of patient KG, indicating the activated subgraph for prompting conversaion. Center: The dialogue interface. Right: The DiT model interface.
  • Figure 3: An assessment example of a student's performance in medical conversation.