medDreamer: Model-Based Reinforcement Learning with Latent Imagination on Complex EHRs for Clinical Decision Support
Qianyi Xu, Gousia Habib, Feng Wu, Dilruk Perera, Mengling Feng
TL;DR
medDreamer tackles the challenges of irregular, informative missingness in EHRs by learning a latent world model with an Adaptive Feature Integration module and training policies in a two-phase offline regime that grounds learning in real data before exploring imagined trajectories. The approach enables counterfactual reasoning and long-horizon planning without real-world experimentation, and is evaluated on sepsis treatment and mechanical ventilation tasks using large real-world ICU datasets. Results show improved estimated mortality, better alignment with clinician patterns, and reliable latent dynamics, highlighting the method’s potential as a clinically grounded decision-support tool. The work advances model-based RL for healthcare by combining latent imagination with structured grounding to balance safety, realism, and exploratory learning in offline settings.
Abstract
Timely and personalized treatment decisions are essential across a wide range of healthcare settings where patient responses can vary significantly and evolve over time. Clinical data used to support these treatment decisions are often irregularly sampled, where missing data frequencies may implicitly convey information about the patient's condition. Existing Reinforcement Learning (RL) based clinical decision support systems often ignore the missing patterns and distort them with coarse discretization and simple imputation. They are also predominantly model-free and largely depend on retrospective data, which could lead to insufficient exploration and bias by historical behaviors. To address these limitations, we propose medDreamer, a novel model-based reinforcement learning framework for personalized treatment recommendation. medDreamer contains a world model with an Adaptive Feature Integration module that simulates latent patient states from irregular data and a two-phase policy trained on a hybrid of real and imagined trajectories. This enables learning optimal policies that go beyond the sub-optimality of historical clinical decisions, while remaining close to real clinical data. We evaluate medDreamer on both sepsis and mechanical ventilation treatment tasks using two large-scale Electronic Health Records (EHRs) datasets. Comprehensive evaluations show that medDreamer significantly outperforms model-free and model-based baselines in both clinical outcomes and off-policy metrics.
