Table of Contents
Fetching ...

medDreamer: Model-Based Reinforcement Learning with Latent Imagination on Complex EHRs for Clinical Decision Support

Qianyi Xu, Gousia Habib, Feng Wu, Dilruk Perera, Mengling Feng

TL;DR

medDreamer tackles the challenges of irregular, informative missingness in EHRs by learning a latent world model with an Adaptive Feature Integration module and training policies in a two-phase offline regime that grounds learning in real data before exploring imagined trajectories. The approach enables counterfactual reasoning and long-horizon planning without real-world experimentation, and is evaluated on sepsis treatment and mechanical ventilation tasks using large real-world ICU datasets. Results show improved estimated mortality, better alignment with clinician patterns, and reliable latent dynamics, highlighting the method’s potential as a clinically grounded decision-support tool. The work advances model-based RL for healthcare by combining latent imagination with structured grounding to balance safety, realism, and exploratory learning in offline settings.

Abstract

Timely and personalized treatment decisions are essential across a wide range of healthcare settings where patient responses can vary significantly and evolve over time. Clinical data used to support these treatment decisions are often irregularly sampled, where missing data frequencies may implicitly convey information about the patient's condition. Existing Reinforcement Learning (RL) based clinical decision support systems often ignore the missing patterns and distort them with coarse discretization and simple imputation. They are also predominantly model-free and largely depend on retrospective data, which could lead to insufficient exploration and bias by historical behaviors. To address these limitations, we propose medDreamer, a novel model-based reinforcement learning framework for personalized treatment recommendation. medDreamer contains a world model with an Adaptive Feature Integration module that simulates latent patient states from irregular data and a two-phase policy trained on a hybrid of real and imagined trajectories. This enables learning optimal policies that go beyond the sub-optimality of historical clinical decisions, while remaining close to real clinical data. We evaluate medDreamer on both sepsis and mechanical ventilation treatment tasks using two large-scale Electronic Health Records (EHRs) datasets. Comprehensive evaluations show that medDreamer significantly outperforms model-free and model-based baselines in both clinical outcomes and off-policy metrics.

medDreamer: Model-Based Reinforcement Learning with Latent Imagination on Complex EHRs for Clinical Decision Support

TL;DR

medDreamer tackles the challenges of irregular, informative missingness in EHRs by learning a latent world model with an Adaptive Feature Integration module and training policies in a two-phase offline regime that grounds learning in real data before exploring imagined trajectories. The approach enables counterfactual reasoning and long-horizon planning without real-world experimentation, and is evaluated on sepsis treatment and mechanical ventilation tasks using large real-world ICU datasets. Results show improved estimated mortality, better alignment with clinician patterns, and reliable latent dynamics, highlighting the method’s potential as a clinically grounded decision-support tool. The work advances model-based RL for healthcare by combining latent imagination with structured grounding to balance safety, realism, and exploratory learning in offline settings.

Abstract

Timely and personalized treatment decisions are essential across a wide range of healthcare settings where patient responses can vary significantly and evolve over time. Clinical data used to support these treatment decisions are often irregularly sampled, where missing data frequencies may implicitly convey information about the patient's condition. Existing Reinforcement Learning (RL) based clinical decision support systems often ignore the missing patterns and distort them with coarse discretization and simple imputation. They are also predominantly model-free and largely depend on retrospective data, which could lead to insufficient exploration and bias by historical behaviors. To address these limitations, we propose medDreamer, a novel model-based reinforcement learning framework for personalized treatment recommendation. medDreamer contains a world model with an Adaptive Feature Integration module that simulates latent patient states from irregular data and a two-phase policy trained on a hybrid of real and imagined trajectories. This enables learning optimal policies that go beyond the sub-optimality of historical clinical decisions, while remaining close to real clinical data. We evaluate medDreamer on both sepsis and mechanical ventilation treatment tasks using two large-scale Electronic Health Records (EHRs) datasets. Comprehensive evaluations show that medDreamer significantly outperforms model-free and model-based baselines in both clinical outcomes and off-policy metrics.

Paper Structure

This paper contains 40 sections, 17 equations, 10 figures, 8 tables.

Figures (10)

  • Figure 1: Our medDreamer architecture is divided into two stages: World model training and Policy training. The world model uses time interval $\Delta_t$, observation$o_t$ and clinician action $a_t$ to learn patient latent dynamics through reconstruction and predict reward $\hat{r}_t$. Policy training consists of two phases: In Phase 1 we use $T$ steps of states generated from real clinical decisions and imagine $\tau$ steps into the future and train the policy using the concatenated states. In Phase 2 we use full imagined trajectories of length $H$.
  • Figure 2: Differences between clinicians' actions and policy actions for sepsis task.
  • Figure 3: Differences between clinicians' actions and policy actions for MV task.
  • Figure 4: Treatment action distributions for sepsis task.
  • Figure 5: Doses recommended by policy minus clinician vs mortality stratified by SOFA. For patients with high SOFA, there are no cases where policy recommends higher dose of Vasopressor than clinicians.
  • ...and 5 more figures