Multi-Modal Mamba Modeling for Survival Prediction (M4Survive): Adapting Joint Foundation Model Representations
Ho Hin Lee, Alberto Santamaria-Pang, Jameson Merkov, Matthew Lungren, Ivan Tarapov
TL;DR
This work tackles survival prediction in oncology by integrating multi-modal imaging (radiology and pathology) through foundation-model embeddings. M4Survive learns a joint semantic space from modality-specific embeddings and uses a token-based Mamba adapter for efficient, cross-modal fusion with linear complexity. Survival is modeled via the Cox ranking loss and time-dependent hazards in a Cox PH framework, enabling robust risk estimation. On glioma datasets, M4Survive outperforms unimodal and static multi-modal baselines (c-index ≈ 81.27), with ablations confirming the value of domain-specific foundation models and the Mamba adapter. The approach is highly scalable, reportedly trainable in about 15 seconds on modest hardware, highlighting potential for clinical deployment and precision oncology analytics.
Abstract
Accurate survival prediction in oncology requires integrating diverse imaging modalities to capture the complex interplay of tumor biology. Traditional single-modality approaches often fail to leverage the complementary insights provided by radiological and pathological assessments. In this work, we introduce M4Survive (Multi-Modal Mamba Modeling for Survival Prediction), a novel framework that learns joint foundation model representations using efficient adapter networks. Our approach dynamically fuses heterogeneous embeddings from a foundation model repository (e.g., MedImageInsight, BiomedCLIP, Prov-GigaPath, UNI2-h), creating a correlated latent space optimized for survival risk estimation. By leveraging Mamba-based adapters, M4Survive enables efficient multi-modal learning while preserving computational efficiency. Experimental evaluations on benchmark datasets demonstrate that our approach outperforms both unimodal and traditional static multi-modal baselines in survival prediction accuracy. This work underscores the potential of foundation model-driven multi-modal fusion in advancing precision oncology and predictive analytics.
