Table of Contents
Fetching ...

EHR2Path: Scalable Modeling of Longitudinal Patient Pathways from Multimodal Electronic Health Records

Chantal Pellegrini, Ege Özsoy, David Bani-Harouni, Matthias Keicher, Nassir Navab

Abstract

Forecasting how a patient's condition is likely to evolve, including possible deterioration, recovery, treatment needs, and care transitions, could support more proactive and personalized care, but requires modeling heterogeneous and longitudinal electronic health record (EHR) data. Yet, existing approaches typically focus on isolated prediction tasks, narrow feature spaces, or short context windows, limiting their ability to model full patient pathways. To address this gap, we introduce EHR2Path, a multimodal framework for forecasting and simulating full in-hospital patient pathways from routine EHRs. EHR2Path converts diverse clinical inputs into a unified temporal representation, enabling modeling of a substantially broader set of patient information, including radiology reports, physician notes, vital signs, medication and laboratory patterns, and dense bedside charting. To support long clinical histories and broad feature spaces, we introduce a Masked Summarization Bottleneck that compresses long-term history into compact, task-optimized summary tokens while preserving recent context, improving both performance and token efficiency. In retrospective experiments on MIMIC-IV, EHR2Path enables next-step pathway forecasting and iterative simulation of complete in-hospital trajectories, while outperforming strong baselines on directly comparable tasks. These results establish a foundation for scalable pathway-level modeling from routine EHRs supporting anticipatory clinical decision-making. Our code is available at https://github.com/ChantalMP/EHR2Path.

EHR2Path: Scalable Modeling of Longitudinal Patient Pathways from Multimodal Electronic Health Records

Abstract

Forecasting how a patient's condition is likely to evolve, including possible deterioration, recovery, treatment needs, and care transitions, could support more proactive and personalized care, but requires modeling heterogeneous and longitudinal electronic health record (EHR) data. Yet, existing approaches typically focus on isolated prediction tasks, narrow feature spaces, or short context windows, limiting their ability to model full patient pathways. To address this gap, we introduce EHR2Path, a multimodal framework for forecasting and simulating full in-hospital patient pathways from routine EHRs. EHR2Path converts diverse clinical inputs into a unified temporal representation, enabling modeling of a substantially broader set of patient information, including radiology reports, physician notes, vital signs, medication and laboratory patterns, and dense bedside charting. To support long clinical histories and broad feature spaces, we introduce a Masked Summarization Bottleneck that compresses long-term history into compact, task-optimized summary tokens while preserving recent context, improving both performance and token efficiency. In retrospective experiments on MIMIC-IV, EHR2Path enables next-step pathway forecasting and iterative simulation of complete in-hospital trajectories, while outperforming strong baselines on directly comparable tasks. These results establish a foundation for scalable pathway-level modeling from routine EHRs supporting anticipatory clinical decision-making. Our code is available at https://github.com/ChantalMP/EHR2Path.

Paper Structure

This paper contains 27 sections, 1 equation, 8 figures, 7 tables.

Figures (8)

  • Figure 1: Short-term Patient Pathway visualization within Emergency Department, Hospital and ICU.
  • Figure 2: Overview of our proposed method. A patient record is structured into text from which a fixed time window is kept as text representation, while the full temporal context is summarized into an embedding-based summary by the summary model. The Pathway Model combines both representations to predict the next time-step. For iterative simulation, predictions update the patient record to simulate future trajectories until a termination condition is met.
  • Figure 3: Masked Summarization Bottleneck. Input tokens encode past observations, while summary tokens ($<$SUM$>$) compress key information. A custom attention mask ensures outputs attend only to summaries, forcing the model to encode relevant patient data into a compact representation.
  • Figure 4: Overview of the evaluation across simulation and outcome tasks, summarizing task coverage and performance of EHR2Path relative to prior baselines on the main task groups. We report the primary metric per task, with numerical tasks reporting $1-\mathrm{MAE}$. Unsupported tasks are plotted as zero to visualize task coverage.
  • Figure 5: Development of event detection (F1) and value prediction (MAE) across simulation horizons for ED, hospital, and ICU development tasks. The shaded areas indicate 95% confidence intervals.
  • ...and 3 more figures