Multimodal Disease Progression Modeling via Spatiotemporal Disentanglement and Multiscale Alignment
Chen Liu, Wenfang Yao, Kejing Yin, William K. Cheung, Jing Qin
TL;DR
DiPro introduces Spatiotemporal Disentanglement (STD) to separate dynamic pathology from static anatomy in region-level chest X-rays, Progression-Aware Enhancement (PAE) to enforce progression direction, and Multiscale Multimodal Fusion (MMF) to align CXR dynamics with asynchronous EHR data at interval and sequence scales. The method jointly optimizes orthogonal disentanglement and temporal-consistency losses, plus a progression-aware loss, while fusing modalities via local interval- and global sequence-level attention. On the MIMIC dataset, DiPro achieves state-of-the-art performance in disease progression identification and general ICU prediction, with attention patterns that align with established clinical radiology and ICU knowledge. The work provides a scalable framework for leveraging longitudinal imaging and time-series data to improve personalized ICU care and decision-making.
Abstract
Longitudinal multimodal data, including electronic health records (EHR) and sequential chest X-rays (CXRs), is critical for modeling disease progression, yet remains underutilized due to two key challenges: (1) redundancy in consecutive CXR sequences, where static anatomical regions dominate over clinically-meaningful dynamics, and (2) temporal misalignment between sparse, irregular imaging and continuous EHR data. We introduce $\texttt{DiPro}$, a novel framework that addresses these challenges through region-aware disentanglement and multi-timescale alignment. First, we disentangle static (anatomy) and dynamic (pathology progression) features in sequential CXRs, prioritizing disease-relevant changes. Second, we hierarchically align these static and dynamic CXR features with asynchronous EHR data via local (pairwise interval-level) and global (full-sequence) synchronization to model coherent progression pathways. Extensive experiments on the MIMIC dataset demonstrate that $\texttt{DiPro}$ could effectively extract temporal clinical dynamics and achieve state-of-the-art performance on both disease progression identification and general ICU prediction tasks.
