Table of Contents
Fetching ...

Multi-Modal Mamba Modeling for Survival Prediction (M4Survive): Adapting Joint Foundation Model Representations

Ho Hin Lee, Alberto Santamaria-Pang, Jameson Merkov, Matthew Lungren, Ivan Tarapov

TL;DR

This work tackles survival prediction in oncology by integrating multi-modal imaging (radiology and pathology) through foundation-model embeddings. M4Survive learns a joint semantic space from modality-specific embeddings and uses a token-based Mamba adapter for efficient, cross-modal fusion with linear complexity. Survival is modeled via the Cox ranking loss and time-dependent hazards in a Cox PH framework, enabling robust risk estimation. On glioma datasets, M4Survive outperforms unimodal and static multi-modal baselines (c-index ≈ 81.27), with ablations confirming the value of domain-specific foundation models and the Mamba adapter. The approach is highly scalable, reportedly trainable in about 15 seconds on modest hardware, highlighting potential for clinical deployment and precision oncology analytics.

Abstract

Accurate survival prediction in oncology requires integrating diverse imaging modalities to capture the complex interplay of tumor biology. Traditional single-modality approaches often fail to leverage the complementary insights provided by radiological and pathological assessments. In this work, we introduce M4Survive (Multi-Modal Mamba Modeling for Survival Prediction), a novel framework that learns joint foundation model representations using efficient adapter networks. Our approach dynamically fuses heterogeneous embeddings from a foundation model repository (e.g., MedImageInsight, BiomedCLIP, Prov-GigaPath, UNI2-h), creating a correlated latent space optimized for survival risk estimation. By leveraging Mamba-based adapters, M4Survive enables efficient multi-modal learning while preserving computational efficiency. Experimental evaluations on benchmark datasets demonstrate that our approach outperforms both unimodal and traditional static multi-modal baselines in survival prediction accuracy. This work underscores the potential of foundation model-driven multi-modal fusion in advancing precision oncology and predictive analytics.

Multi-Modal Mamba Modeling for Survival Prediction (M4Survive): Adapting Joint Foundation Model Representations

TL;DR

This work tackles survival prediction in oncology by integrating multi-modal imaging (radiology and pathology) through foundation-model embeddings. M4Survive learns a joint semantic space from modality-specific embeddings and uses a token-based Mamba adapter for efficient, cross-modal fusion with linear complexity. Survival is modeled via the Cox ranking loss and time-dependent hazards in a Cox PH framework, enabling robust risk estimation. On glioma datasets, M4Survive outperforms unimodal and static multi-modal baselines (c-index ≈ 81.27), with ablations confirming the value of domain-specific foundation models and the Mamba adapter. The approach is highly scalable, reportedly trainable in about 15 seconds on modest hardware, highlighting potential for clinical deployment and precision oncology analytics.

Abstract

Accurate survival prediction in oncology requires integrating diverse imaging modalities to capture the complex interplay of tumor biology. Traditional single-modality approaches often fail to leverage the complementary insights provided by radiological and pathological assessments. In this work, we introduce M4Survive (Multi-Modal Mamba Modeling for Survival Prediction), a novel framework that learns joint foundation model representations using efficient adapter networks. Our approach dynamically fuses heterogeneous embeddings from a foundation model repository (e.g., MedImageInsight, BiomedCLIP, Prov-GigaPath, UNI2-h), creating a correlated latent space optimized for survival risk estimation. By leveraging Mamba-based adapters, M4Survive enables efficient multi-modal learning while preserving computational efficiency. Experimental evaluations on benchmark datasets demonstrate that our approach outperforms both unimodal and traditional static multi-modal baselines in survival prediction accuracy. This work underscores the potential of foundation model-driven multi-modal fusion in advancing precision oncology and predictive analytics.

Paper Structure

This paper contains 10 sections, 6 equations, 2 figures, 2 tables.

Figures (2)

  • Figure 1: Overview of M4Survive. MRI diagnostic images with variable contrast and cancer site histopathology images are independently processed through dedicated healthcare foundation models to generate modality-specific embeddings. Such embeddings are subsequently mapped into a joint-modality semantic space following with modality-specific encoders and leverage a Mamba adapter to perform multi-modal fusion for survival predictions.
  • Figure 2: Kaplan–Meier survival curves for high-risk and low-risk groups predicted by two ablation configurations using the Mamba and transformer adapters.