Table of Contents
Fetching ...

ModalSurv: Investigating opportunities and limitations of multimodal deep survival learning in prostate and bladder cancer

Noorul Wahab, Ethar Alzaid, Jiaqi Lv, Fayyaz Minhas, Adam Shephard, Shan E Ahmed Raza

TL;DR

ModalSurv proposes a cross-attention-based multimodal survival framework that integrates clinical, imaging, histopathology, and transcriptomic data to predict cancer recurrence. Evaluated on CHIMERA prostate and bladder cohorts, the approach achieves competitive C-indices but reveals that clinical features alone often provide the most robust generalisation in small, partially aligned datasets. The findings highlight the potential of foundation-model-derived pathology embeddings and multimodal fusion while underscoring the need for larger, harmonised datasets and uncertainty-aware fusion strategies for reliable clinical deployment. Overall, the work clarifies both the opportunities and current barriers in scalable multimodal survival prediction for cancer prognosis.

Abstract

Accurate survival prediction is essential for personalised cancer treatment. We propose ModalSurv, a multimodal deep survival framework integrating clinical, MRI, histopathology, and RNA-sequencing data via modality-specific projections and cross-attention fusion. On the CHIMERA Grand Challenge datasets, ModalSurv achieved a C-index of 0.7402 (1st) for prostate and 0.5740 (5th) for bladder cancer. Notably, clinical features alone outperformed multimodal models on external tests, highlighting challenges of limited multimodal alignment and potential overfitting. Local validation showed multimodal gains but limited generalisation. ModalSurv provides a systematic evaluation of multimodal survival modelling, underscoring both its promise and current limitations for scalable, generalisable cancer prognosis.

ModalSurv: Investigating opportunities and limitations of multimodal deep survival learning in prostate and bladder cancer

TL;DR

ModalSurv proposes a cross-attention-based multimodal survival framework that integrates clinical, imaging, histopathology, and transcriptomic data to predict cancer recurrence. Evaluated on CHIMERA prostate and bladder cohorts, the approach achieves competitive C-indices but reveals that clinical features alone often provide the most robust generalisation in small, partially aligned datasets. The findings highlight the potential of foundation-model-derived pathology embeddings and multimodal fusion while underscoring the need for larger, harmonised datasets and uncertainty-aware fusion strategies for reliable clinical deployment. Overall, the work clarifies both the opportunities and current barriers in scalable multimodal survival prediction for cancer prognosis.

Abstract

Accurate survival prediction is essential for personalised cancer treatment. We propose ModalSurv, a multimodal deep survival framework integrating clinical, MRI, histopathology, and RNA-sequencing data via modality-specific projections and cross-attention fusion. On the CHIMERA Grand Challenge datasets, ModalSurv achieved a C-index of 0.7402 (1st) for prostate and 0.5740 (5th) for bladder cancer. Notably, clinical features alone outperformed multimodal models on external tests, highlighting challenges of limited multimodal alignment and potential overfitting. Local validation showed multimodal gains but limited generalisation. ModalSurv provides a systematic evaluation of multimodal survival modelling, underscoring both its promise and current limitations for scalable, generalisable cancer prognosis.

Paper Structure

This paper contains 13 sections, 1 figure, 2 tables.

Figures (1)

  • Figure 1: ModalSurv: Overview of the ModalSurv pipeline integrating multimodal clinical, MRI, RNA-seq, and WSI features through modality-specific encoders, cross-attention fusion, and a DeepHit-based survival prediction head.