Table of Contents
Fetching ...

SurvMamba: State Space Model with Multi-grained Multi-modal Interaction for Survival Prediction

Ying Chen, Jiajing Xie, Yuxiang Lin, Yuhang Song, Wenxian Yang, Rongshan Yu

TL;DR

SurvMamba tackles survival prediction by capturing hierarchical, multi-grained information from WSIs and transcriptomics and fusing them through two novel Mamba-based modules. The Hierarchical Interaction Mamba (HIM) learns intra-modal representations at fine and coarse granularities, while the Interaction Fusion Mamba (IFM) enables cross-modal fusion across these granularities. By adaptively combining fine- and coarse-grained inter-modal features, SurvMamba achieves state-of-the-art c-index on five TCGA datasets with reduced computational cost, demonstrating robustness and potential clinical impact for cancer prognosis. The work highlights the value of structuring high-dimensional omics data into hierarchies and leveraging efficient state-space modeling for scalable multi-modal survival analysis.

Abstract

Multi-modal learning that combines pathological images with genomic data has significantly enhanced the accuracy of survival prediction. Nevertheless, existing methods have not fully utilized the inherent hierarchical structure within both whole slide images (WSIs) and transcriptomic data, from which better intra-modal representations and inter-modal integration could be derived. Moreover, many existing studies attempt to improve multi-modal representations through attention mechanisms, which inevitably lead to high complexity when processing high-dimensional WSIs and transcriptomic data. Recently, a structured state space model named Mamba emerged as a promising approach for its superior performance in modeling long sequences with low complexity. In this study, we propose Mamba with multi-grained multi-modal interaction (SurvMamba) for survival prediction. SurvMamba is implemented with a Hierarchical Interaction Mamba (HIM) module that facilitates efficient intra-modal interactions at different granularities, thereby capturing more detailed local features as well as rich global representations. In addition, an Interaction Fusion Mamba (IFM) module is used for cascaded inter-modal interactive fusion, yielding more comprehensive features for survival prediction. Comprehensive evaluations on five TCGA datasets demonstrate that SurvMamba outperforms other existing methods in terms of performance and computational cost.

SurvMamba: State Space Model with Multi-grained Multi-modal Interaction for Survival Prediction

TL;DR

SurvMamba tackles survival prediction by capturing hierarchical, multi-grained information from WSIs and transcriptomics and fusing them through two novel Mamba-based modules. The Hierarchical Interaction Mamba (HIM) learns intra-modal representations at fine and coarse granularities, while the Interaction Fusion Mamba (IFM) enables cross-modal fusion across these granularities. By adaptively combining fine- and coarse-grained inter-modal features, SurvMamba achieves state-of-the-art c-index on five TCGA datasets with reduced computational cost, demonstrating robustness and potential clinical impact for cancer prognosis. The work highlights the value of structuring high-dimensional omics data into hierarchies and leveraging efficient state-space modeling for scalable multi-modal survival analysis.

Abstract

Multi-modal learning that combines pathological images with genomic data has significantly enhanced the accuracy of survival prediction. Nevertheless, existing methods have not fully utilized the inherent hierarchical structure within both whole slide images (WSIs) and transcriptomic data, from which better intra-modal representations and inter-modal integration could be derived. Moreover, many existing studies attempt to improve multi-modal representations through attention mechanisms, which inevitably lead to high complexity when processing high-dimensional WSIs and transcriptomic data. Recently, a structured state space model named Mamba emerged as a promising approach for its superior performance in modeling long sequences with low complexity. In this study, we propose Mamba with multi-grained multi-modal interaction (SurvMamba) for survival prediction. SurvMamba is implemented with a Hierarchical Interaction Mamba (HIM) module that facilitates efficient intra-modal interactions at different granularities, thereby capturing more detailed local features as well as rich global representations. In addition, an Interaction Fusion Mamba (IFM) module is used for cascaded inter-modal interactive fusion, yielding more comprehensive features for survival prediction. Comprehensive evaluations on five TCGA datasets demonstrate that SurvMamba outperforms other existing methods in terms of performance and computational cost.
Paper Structure (17 sections, 8 equations, 5 figures, 3 tables)

This paper contains 17 sections, 8 equations, 5 figures, 3 tables.

Figures (5)

  • Figure 1: Hierarchical structure of WSI and transcriptomic data.
  • Figure 2: Overview of SurvMamba architecture. WSIs and transcriptomics are illustrated in a three-layer structure, comprising WSI/Region/Patch for WSIs and Gene/Process/Function for transcriptomics, respectively.
  • Figure 3: Computational complexity analysis.
  • Figure 4: Ablation study on the bidirectional SSM designed in SurvMamba.
  • Figure 5: Kaplan-Meier survival curves of SurvMamba on five TCGA cancer datasets.