A Unified Model for Longitudinal Multi-Modal Multi-View Prediction with Missingness
Boqi Chen, Junier Oliva, Marc Niethammer
TL;DR
The paper tackles predicting future clinical outcomes from longitudinal, multi-modal medical records despite missing data across timepoints and views. It introduces a unified model with separate view encoders, a masked attention-based summarizer, and a transformer decoder that can process arbitrary input histories without imputation, using learnable [SUM] and [PAD] embeddings and a mask $\ \mathcal{M}$ to handle absent views. Evaluated on the Osteoarthritis Initiative dataset for WOMAC pain and Kellgren-Lawrence grade prediction, the approach achieves competitive performance against view-specific baselines, with gains from longer temporal histories and the ability to accommodate varying view combinations. The work also provides post-hoc analyses of view importance, highlighting knee radiographs and cartilage thickness maps as key contributors for different tasks, and demonstrates the practical impact of flexible, missingness-tolerant, multi-view modeling in real-world clinical data. $OAI$ data handling and the use of a transformer decoder to integrate longitudinal information make the method broadly applicable to other longitudinal multi-modal medical prediction tasks.
Abstract
Medical records often consist of different modalities, such as images, text, and tabular information. Integrating all modalities offers a holistic view of a patient's condition, while analyzing them longitudinally provides a better understanding of disease progression. However, real-world longitudinal medical records present challenges: 1) patients may lack some or all of the data for a specific timepoint, and 2) certain modalities or views might be absent for all patients during a particular period. In this work, we introduce a unified model for longitudinal multi-modal multi-view prediction with missingness. Our method allows as many timepoints as desired for input, and aims to leverage all available data, regardless of their availability. We conduct extensive experiments on the knee osteoarthritis dataset from the Osteoarthritis Initiative for pain and Kellgren-Lawrence grade prediction at a future timepoint. We demonstrate the effectiveness of our method by comparing results from our unified model to specific models that use the same modality and view combinations during training and evaluation. We also show the benefit of having extended temporal data and provide post-hoc analysis for a deeper understanding of each modality/view's importance for different tasks.
