Table of Contents
Fetching ...

Multimodal Disease Progression Modeling via Spatiotemporal Disentanglement and Multiscale Alignment

Chen Liu, Wenfang Yao, Kejing Yin, William K. Cheung, Jing Qin

TL;DR

DiPro introduces Spatiotemporal Disentanglement (STD) to separate dynamic pathology from static anatomy in region-level chest X-rays, Progression-Aware Enhancement (PAE) to enforce progression direction, and Multiscale Multimodal Fusion (MMF) to align CXR dynamics with asynchronous EHR data at interval and sequence scales. The method jointly optimizes orthogonal disentanglement and temporal-consistency losses, plus a progression-aware loss, while fusing modalities via local interval- and global sequence-level attention. On the MIMIC dataset, DiPro achieves state-of-the-art performance in disease progression identification and general ICU prediction, with attention patterns that align with established clinical radiology and ICU knowledge. The work provides a scalable framework for leveraging longitudinal imaging and time-series data to improve personalized ICU care and decision-making.

Abstract

Longitudinal multimodal data, including electronic health records (EHR) and sequential chest X-rays (CXRs), is critical for modeling disease progression, yet remains underutilized due to two key challenges: (1) redundancy in consecutive CXR sequences, where static anatomical regions dominate over clinically-meaningful dynamics, and (2) temporal misalignment between sparse, irregular imaging and continuous EHR data. We introduce $\texttt{DiPro}$, a novel framework that addresses these challenges through region-aware disentanglement and multi-timescale alignment. First, we disentangle static (anatomy) and dynamic (pathology progression) features in sequential CXRs, prioritizing disease-relevant changes. Second, we hierarchically align these static and dynamic CXR features with asynchronous EHR data via local (pairwise interval-level) and global (full-sequence) synchronization to model coherent progression pathways. Extensive experiments on the MIMIC dataset demonstrate that $\texttt{DiPro}$ could effectively extract temporal clinical dynamics and achieve state-of-the-art performance on both disease progression identification and general ICU prediction tasks.

Multimodal Disease Progression Modeling via Spatiotemporal Disentanglement and Multiscale Alignment

TL;DR

DiPro introduces Spatiotemporal Disentanglement (STD) to separate dynamic pathology from static anatomy in region-level chest X-rays, Progression-Aware Enhancement (PAE) to enforce progression direction, and Multiscale Multimodal Fusion (MMF) to align CXR dynamics with asynchronous EHR data at interval and sequence scales. The method jointly optimizes orthogonal disentanglement and temporal-consistency losses, plus a progression-aware loss, while fusing modalities via local interval- and global sequence-level attention. On the MIMIC dataset, DiPro achieves state-of-the-art performance in disease progression identification and general ICU prediction, with attention patterns that align with established clinical radiology and ICU knowledge. The work provides a scalable framework for leveraging longitudinal imaging and time-series data to improve personalized ICU care and decision-making.

Abstract

Longitudinal multimodal data, including electronic health records (EHR) and sequential chest X-rays (CXRs), is critical for modeling disease progression, yet remains underutilized due to two key challenges: (1) redundancy in consecutive CXR sequences, where static anatomical regions dominate over clinically-meaningful dynamics, and (2) temporal misalignment between sparse, irregular imaging and continuous EHR data. We introduce , a novel framework that addresses these challenges through region-aware disentanglement and multi-timescale alignment. First, we disentangle static (anatomy) and dynamic (pathology progression) features in sequential CXRs, prioritizing disease-relevant changes. Second, we hierarchically align these static and dynamic CXR features with asynchronous EHR data via local (pairwise interval-level) and global (full-sequence) synchronization to model coherent progression pathways. Extensive experiments on the MIMIC dataset demonstrate that could effectively extract temporal clinical dynamics and achieve state-of-the-art performance on both disease progression identification and general ICU prediction tasks.

Paper Structure

This paper contains 48 sections, 12 equations, 2 figures, 19 tables.

Figures (2)

  • Figure 1: Overview of the DiPro framework. The model comprises three main modules: (1) Spatiotemporal Disentanglement (STD) separates dynamic pathological features ($\mathbf{D}_{i}^{r}$) from static anatomical structures ($\mathbf{S}_{i}^{r}$) in region-level chest X-rays across time; (2) Progression-Aware Enhancement (PAE) strengthens the model’s understanding of progression direction by reversing CXR pair order and enforcing the reversed dynamic features $\widetilde{\mathbf{D}}_{i}^{r}$ to predict the reversed progression, while maintaining consistency in static features; (3) Multiscale Multimodal Fusion (MMF) integrates CXR features with temporally misaligned EHR data via local (interval-level) and global (sequence-level) fusion, enabling accurate predictions across multiple clinical tasks, including disease progression identification, length-of-stay classification, and in-hospital mortality prediction.
  • Figure 2: Averaged attention weights of CXR regions in different downstream tasks. The radial axis in (a) is log-scaled to enhance distribution visibility. Mean attention weights across CXR regions reveal DiPro's clinical alignment: (a) overlapping distributions for pneumonia, lung opacity, and pleural effusion reflect shared pathologies, while (b) ICU tasks show divergent patterns: higher weights for right-sided regions in mortality (linked to higher risk) versus diffuse attention in length-of-stay (reflecting multifactorial ICU conditions).