Table of Contents
Fetching ...

CLARITY: Medical World Model for Guiding Treatment Decisions by Modeling Context-Aware Disease Trajectories in Latent Space

Tianxingjian Ding, Yuanhao Zou, Chen Chen, Mubarak Shah, Yu Tian

TL;DR

<3-5 sentence high-level summary> CLARITY tackles the problem of predicting dynamic, treatment-conditioned disease trajectories in oncology by embedding disease evolution in a structured latent space that is conditioned on rich clinical and temporal context. It introduces a Therapy Policies Agent and a post-treatment latent predictor (Actor) paired with a survival predictor, and couples them through an Inverse Survival Evaluation that iteratively refines therapy proposals into actionable decisions. Unlike diffusion-based image reconstruction approaches, CLARITY focuses on latent dynamics for physiologically faithful trajectories and demonstrates state-of-the-art treatment-planning performance on glioma datasets, with strong survival discrimination and computational efficiency. The framework also shows generalization to breast cancer and includes interpretable latent predictions via a latent-to-MRI decoder, while enforcing safety and validity through constraint-aware feedback loops.</3-5 sentence high-level summary>

Abstract

Clinical decision-making in oncology requires predicting dynamic disease evolution, a task current static AI predictors cannot perform. While world models (WMs) offer a paradigm for generative prediction, existing medical applications remain limited. Existing methods often rely on stochastic diffusion models, focusing on visual reconstruction rather than causal, physiological transitions. Furthermore, in medical domain, models like MeWM typically ignore patient-specific temporal and clinical contexts and lack a feedback mechanism to link predictions to treatment decisions. To address these gaps, we introduce CLARITY, a medical world model that forecasts disease evolution directly within a structured latent space. It explicitly integrates time intervals (temporal context) and patient-specific data (clinical context) to model treatment-conditioned progression as a smooth, interpretable trajectory, and thus generate physiologically faithful, individualized treatment plans. Finally, CLARITY introduces a novel prediction-to-decision framework, translating latent rollouts into transparent, actionable recommendations. CLARITY demonstrates state-of-the-art performance in treatment planning. On the MU-Glioma-Post dataset, our approach outperforms recent MeWM by 12\%, and significantly surpasses all other medical-specific large language models.

CLARITY: Medical World Model for Guiding Treatment Decisions by Modeling Context-Aware Disease Trajectories in Latent Space

TL;DR

<3-5 sentence high-level summary> CLARITY tackles the problem of predicting dynamic, treatment-conditioned disease trajectories in oncology by embedding disease evolution in a structured latent space that is conditioned on rich clinical and temporal context. It introduces a Therapy Policies Agent and a post-treatment latent predictor (Actor) paired with a survival predictor, and couples them through an Inverse Survival Evaluation that iteratively refines therapy proposals into actionable decisions. Unlike diffusion-based image reconstruction approaches, CLARITY focuses on latent dynamics for physiologically faithful trajectories and demonstrates state-of-the-art treatment-planning performance on glioma datasets, with strong survival discrimination and computational efficiency. The framework also shows generalization to breast cancer and includes interpretable latent predictions via a latent-to-MRI decoder, while enforcing safety and validity through constraint-aware feedback loops.</3-5 sentence high-level summary>

Abstract

Clinical decision-making in oncology requires predicting dynamic disease evolution, a task current static AI predictors cannot perform. While world models (WMs) offer a paradigm for generative prediction, existing medical applications remain limited. Existing methods often rely on stochastic diffusion models, focusing on visual reconstruction rather than causal, physiological transitions. Furthermore, in medical domain, models like MeWM typically ignore patient-specific temporal and clinical contexts and lack a feedback mechanism to link predictions to treatment decisions. To address these gaps, we introduce CLARITY, a medical world model that forecasts disease evolution directly within a structured latent space. It explicitly integrates time intervals (temporal context) and patient-specific data (clinical context) to model treatment-conditioned progression as a smooth, interpretable trajectory, and thus generate physiologically faithful, individualized treatment plans. Finally, CLARITY introduces a novel prediction-to-decision framework, translating latent rollouts into transparent, actionable recommendations. CLARITY demonstrates state-of-the-art performance in treatment planning. On the MU-Glioma-Post dataset, our approach outperforms recent MeWM by 12\%, and significantly surpasses all other medical-specific large language models.

Paper Structure

This paper contains 37 sections, 15 equations, 7 figures, 6 tables.

Figures (7)

  • Figure 1: Conceptual Overview of CLARITY. Moving beyond static prediction, CLARITY's latent-based Actor simulates multiple "what-if" disease trajectories (Future Latent Prediction) conditioned on rich Clinical and Temporal Contexts. This is not a one-way process: Inverse feedback (orange arrow) enables the iterative refinement of Action Proposals, translating predictions into a concrete, optimized treatment plan for the clinician.
  • Figure 2: CLARITY's Inference Pipeline for (a) Direct Survival Evaluation: A frozen MRI Encoder processes the pre-imaging observation to extract a pre-treatment latent representation. In parallel, the Therapy Policies Agent (e.g., GPT4o) takes the patient's clinical context to propose multiple candidate drug combos. The Actor module (Diseases Evolution Model) then sequentially evaluates each combo one-by-one, integrating the pre-treatment latent, clinical context, temporal context, and the specific drug combo to predict a final risk score for survival analysis; (b) CLARITY's Inverse Survival Evaluation: This diagram illustrates the iterative prediction-to-decision feedback loop. Initial risk scores from Direct Survival Evaluation (\ref{['fig:direct_pipeline']} (a)) are fed into the Therapy Policies Agent. The Agent then proposes updated drug combos, which the Actor Scores to generate new risk estimates as the accumulated survival feedback. This process repeats, refining the therapy proposals, and after $K$ iterations, the policy with the Lowest Risk Score is selected as the Final Action.
  • Figure 3: Training Pipeline of the Actor (Diseases Evolution Model). The Post-Treatment Latent Predictor consists of a $N$-layer self-attention Transformer to forecast post-treatment latents, trained via L1 loss for reconstructing post-treatment latent, and Contrastive loss for align semantic structure among various latents. The Survival Predictor is equipped with a $M$-layer two-way cross-attention Transformer to estimate risk score and survival probability from pre- and post-treatment latents, supervised by Cox and Brier losses.
  • Figure 4: Kaplan--Meier survival curves predicted by MeWM (left) and our method (right) on MU-Glioma-Post. Our approach produces a much clearer separation across risk strata, reflected by a lower log-rank $p$-value of 0.0017 and a substantially higher C-index of 0.7856. Shaded regions denote 95% confidence intervals.
  • Figure 5: Multi-stage decision trajectories generated by our model. Each stage (e.g., $S_0$) corresponds to an MRI observation. Dashed lines denote candidate rollouts under different therapy actions, while the solid line with an arrow indicates the selected treamtment sequence achieving the lowest predicted risk.
  • ...and 2 more figures