Table of Contents
Fetching ...

Medical World Model: Generative Simulation of Tumor Evolution for Treatment Planning

Yijun Yang, Zhao-Yang Wang, Qiuping Liu, Shuwen Sun, Kang Wang, Rama Chellappa, Zongwei Zhou, Alan Yuille, Lei Zhu, Yu-Dong Zhang, Jieneng Chen

TL;DR

MeWM addresses the need for interpretable, forward-looking clinical decision support by simulating tumor evolution under different treatments. It couples a vision-language policy model with a 3D diffusion-based tumor dynamics model and an inverse survival-analysis module to evaluate and select optimal treatment plans, demonstrated on TACE for hepatocellular carcinoma. The approach yields realistic post-treatment tumor syntheses, improved survival-risk estimation compared with Cox models, and superior protocol exploration metrics over multi-modal GPT baselines. This work highlights the potential of medical world models to augment interventional decision-making and lay groundwork for integrating predictive simulation into clinical workflows.

Abstract

Providing effective treatment and making informed clinical decisions are essential goals of modern medicine and clinical care. We are interested in simulating disease dynamics for clinical decision-making, leveraging recent advances in large generative models. To this end, we introduce the Medical World Model (MeWM), the first world model in medicine that visually predicts future disease states based on clinical decisions. MeWM comprises (i) vision-language models to serve as policy models, and (ii) tumor generative models as dynamics models. The policy model generates action plans, such as clinical treatments, while the dynamics model simulates tumor progression or regression under given treatment conditions. Building on this, we propose the inverse dynamics model that applies survival analysis to the simulated post-treatment tumor, enabling the evaluation of treatment efficacy and the selection of the optimal clinical action plan. As a result, the proposed MeWM simulates disease dynamics by synthesizing post-treatment tumors, with state-of-the-art specificity in Turing tests evaluated by radiologists. Simultaneously, its inverse dynamics model outperforms medical-specialized GPTs in optimizing individualized treatment protocols across all metrics. Notably, MeWM improves clinical decision-making for interventional physicians, boosting F1-score in selecting the optimal TACE protocol by 13%, paving the way for future integration of medical world models as the second readers.

Medical World Model: Generative Simulation of Tumor Evolution for Treatment Planning

TL;DR

MeWM addresses the need for interpretable, forward-looking clinical decision support by simulating tumor evolution under different treatments. It couples a vision-language policy model with a 3D diffusion-based tumor dynamics model and an inverse survival-analysis module to evaluate and select optimal treatment plans, demonstrated on TACE for hepatocellular carcinoma. The approach yields realistic post-treatment tumor syntheses, improved survival-risk estimation compared with Cox models, and superior protocol exploration metrics over multi-modal GPT baselines. This work highlights the potential of medical world models to augment interventional decision-making and lay groundwork for integrating predictive simulation into clinical workflows.

Abstract

Providing effective treatment and making informed clinical decisions are essential goals of modern medicine and clinical care. We are interested in simulating disease dynamics for clinical decision-making, leveraging recent advances in large generative models. To this end, we introduce the Medical World Model (MeWM), the first world model in medicine that visually predicts future disease states based on clinical decisions. MeWM comprises (i) vision-language models to serve as policy models, and (ii) tumor generative models as dynamics models. The policy model generates action plans, such as clinical treatments, while the dynamics model simulates tumor progression or regression under given treatment conditions. Building on this, we propose the inverse dynamics model that applies survival analysis to the simulated post-treatment tumor, enabling the evaluation of treatment efficacy and the selection of the optimal clinical action plan. As a result, the proposed MeWM simulates disease dynamics by synthesizing post-treatment tumors, with state-of-the-art specificity in Turing tests evaluated by radiologists. Simultaneously, its inverse dynamics model outperforms medical-specialized GPTs in optimizing individualized treatment protocols across all metrics. Notably, MeWM improves clinical decision-making for interventional physicians, boosting F1-score in selecting the optimal TACE protocol by 13%, paving the way for future integration of medical world models as the second readers.

Paper Structure

This paper contains 25 sections, 4 equations, 13 figures, 5 tables, 1 algorithm.

Figures (13)

  • Figure 1: Formulation of Medical World Model. It integrates imaging observations with perception modules to form an initial state, which is then processed by a progression generative model to predict future states of disease under different treatment conditions. Recovery-conditioned policies guide treatment decisions, creating a feedback loop for optimizing clinical interventions.
  • Figure 2: Overview of TACE Protocol Exploration by Medical World Model. (1) GPTs (Policy Model): construct the TACE action combos by the observation of pre-treatment CT, integrating clinical guidelines and policies. (2) Tumor Generative Model (Dynamics Model): simulates post-treatment tumor based on different TACE intervention protocols, predicting treatment outcomes. (3) Survival Analysis Model (Heuristic Function): assesses risk scores from both simulated post-treatment CT and pre-treatment CT to determine the most effective TACE protocol. Note that the 3D tumor masks (colored in red) can be extracted using a well-trained segmentation network (as Assistant Model). The framework enables visually trackable protocol optimization by iterating between clinical policy guidance, generative modeling, and survival analysis.
  • Figure 3: Dynamics Model based on Tumor Generative Model. The training framework consists of three parts: (a) Radiotherapy Report Extraction and Generation: GPT-4o and Deepseek-R1 extract key treatment details from radiotherapy reports and generate corresponding TACE surgical actions. (b) Post-Treatment Tumor Generation: An Action-driven 3D Diffusion Model is conditioned by fused action embeddings and attenuated CT features to generate post-treatment tumors that simulate treatment outcomes. (c) Combo Contrastive Learning (CCL): The model learns from treatment variations by pushing apart dissimilar combos and pulling together similar ones, improving its ability to generate realistic and action-aware post-treatment tumor appearances.
  • Figure 4: Examples of Visual Turing Test. We present one real tumor alongside examples of synthetic tumors that were correctly and incorrectly identified. A red dot indicates the radiologist classified the post-treatment tumor as synthetic, while a green dot signifies it was identified as real.
  • Figure 5: Performance of heuristic function on survival analysis. The first three heatmaps show the true risk distribution, Cox model predictions, and our heuristic function predictions. The last two depict prediction errors, with lower MSE (0.2142) for our model compared to the Cox model (0.3550), demonstrating improved accuracy in capturing localized risk patterns.
  • ...and 8 more figures