Table of Contents
Fetching ...

Medical Video Generation for Disease Progression Simulation

Xu Cao, Kaizhao Liang, Kuei-Da Liao, Tianren Gao, Wenqian Ye, Jintai Chen, Zhiguang Ding, Jianguo Cao, James M. Rehg, Jimeng Sun

TL;DR

The first Medical Video Generation (MVG) framework is proposed that enables controlled manipulation of disease-related image and video features, allowing precise, realistic, and personalized simulations of disease progression.

Abstract

Modeling disease progression is crucial for improving the quality and efficacy of clinical diagnosis and prognosis, but it is often hindered by a lack of longitudinal medical image monitoring for individual patients. To address this challenge, we propose the first Medical Video Generation (MVG) framework that enables controlled manipulation of disease-related image and video features, allowing precise, realistic, and personalized simulations of disease progression. Our approach begins by leveraging large language models (LLMs) to recaption prompt for disease trajectory. Next, a controllable multi-round diffusion model simulates the disease progression state for each patient, creating realistic intermediate disease state sequence. Finally, a diffusion-based video transition generation model interpolates disease progression between these states. We validate our framework across three medical imaging domains: chest X-ray, fundus photography, and skin image. Our results demonstrate that MVG significantly outperforms baseline models in generating coherent and clinically plausible disease trajectories. Two user studies by veteran physicians, provide further validation and insights into the clinical utility of the generated sequences. MVG has the potential to assist healthcare providers in modeling disease trajectories, interpolating missing medical image data, and enhancing medical education through realistic, dynamic visualizations of disease progression.

Medical Video Generation for Disease Progression Simulation

TL;DR

The first Medical Video Generation (MVG) framework is proposed that enables controlled manipulation of disease-related image and video features, allowing precise, realistic, and personalized simulations of disease progression.

Abstract

Modeling disease progression is crucial for improving the quality and efficacy of clinical diagnosis and prognosis, but it is often hindered by a lack of longitudinal medical image monitoring for individual patients. To address this challenge, we propose the first Medical Video Generation (MVG) framework that enables controlled manipulation of disease-related image and video features, allowing precise, realistic, and personalized simulations of disease progression. Our approach begins by leveraging large language models (LLMs) to recaption prompt for disease trajectory. Next, a controllable multi-round diffusion model simulates the disease progression state for each patient, creating realistic intermediate disease state sequence. Finally, a diffusion-based video transition generation model interpolates disease progression between these states. We validate our framework across three medical imaging domains: chest X-ray, fundus photography, and skin image. Our results demonstrate that MVG significantly outperforms baseline models in generating coherent and clinically plausible disease trajectories. Two user studies by veteran physicians, provide further validation and insights into the clinical utility of the generated sequences. MVG has the potential to assist healthcare providers in modeling disease trajectories, interpolating missing medical image data, and enhancing medical education through realistic, dynamic visualizations of disease progression.

Paper Structure

This paper contains 21 sections, 2 theorems, 10 equations, 7 figures, 7 tables.

Key Result

Proposition 1

The proof of Proposition prop:1 and Proposition prop:2 are shown in the supplementary material. Let $x^{0}_{n} \sim \chi$, where $\chi$ is distribution of photo-realistic medical images, $y$ be the text conditioning, running $\operatorname{PIE}_{n}(\cdot, \cdot)$ recursively is denoted as following, Then, the resulting final output $x^0_{N}$ maximizes the posterior probability $p(x^0_{N} |\ x_{0}^

Figures (7)

  • Figure 1: Illustrative examples of video-based disease progression simulation (6-8s) using predefined medical reports and our proposed method. The top sequence depicts a patient's Diabetic Retinopathy. The middle sequence demonstrates the Edema in a patient's lung. The bottom sequence demonstrates the Benign Skin Lesion in a patient's skin.
  • Figure 2: Visualization for cardiomegarly disease state absolute difference heatmap. The highlighted red portion illustrates the progression of the pathology at each step.
  • Figure 3: Overview of the MVG inference pipeline. The above blue part denotes the single step of PIE. For any given step $n$ in PIE, we first utilize inversion of diffusion model to procure an inverted noise map. Subsequently, we denoise it using GPT-4 re-captioned clinical reports from the future state and use the ROI mask to refine the editing after the last step of denoising. The output of a single step of PIE is the input for the next step $n+1$, thus ensuring a gradual and controllable disease progression simulation. After simulating $N$ steps, the image is converged to the final state. The below green part shows the transition generation process between disease states. We use ROI mask to control the mask recovery of SEINE and finally output the long sequence of video-based disease progression.
  • Figure 4: Editing path of PIE, SVD, and Extrapolation in the manifold.
  • Figure 5: Disease Progression Simulation of MVG. The top progression is for Cardiomegarly. The middle progression is for Diabetic Retinopathy. The bottom progression is for Melanocytic Nevus.
  • ...and 2 more figures

Theorems & Definitions (2)

  • Proposition 1
  • Proposition 2