Table of Contents
Fetching ...

Adaptive Prototype Learning for Multimodal Cancer Survival Analysis

Hong Liu, Haosen Yang, Federica Eduati, Josien P. W. Pluim, Mitko Veta

TL;DR

Adaptive Prototype Learning (APL) tackles redundancy in multimodal cancer survival analysis by learning two sets of task-relevant prototypes through learnable queries and cross-attention to bridge high-dimensional histology and genomics representations. A multimodal mixed self-attention module facilitates cross-modal interaction and information fusion, enabling robust survival prediction across five TCGA datasets. On five cancer cohorts, APL achieves a leading average C-index (around 0.72) and outperforms unimodal, multimodal, and prototype-based baselines, with ablation demonstrating the contributions of histology and genomics prototypes and the fusion mechanism. The work provides a practical, data-driven pathway to more accurate prognostic models, with code available at the provided repository for reproducibility.

Abstract

Leveraging multimodal data, particularly the integration of whole-slide histology images (WSIs) and transcriptomic profiles, holds great promise for improving cancer survival prediction. However, excessive redundancy in multimodal data can degrade model performance. In this paper, we propose Adaptive Prototype Learning (APL), a novel and effective approach for multimodal cancer survival analysis. APL adaptively learns representative prototypes in a data-driven manner, reducing redundancy while preserving critical information. Our method employs two sets of learnable query vectors that serve as a bridge between high-dimensional representations and survival prediction, capturing task-relevant features. Additionally, we introduce a multimodal mixed self-attention mechanism to enable cross-modal interactions, further enhancing information fusion. Extensive experiments on five benchmark cancer datasets demonstrate the superiority of our approach over existing methods. The code is available at https://github.com/HongLiuuuuu/APL.

Adaptive Prototype Learning for Multimodal Cancer Survival Analysis

TL;DR

Adaptive Prototype Learning (APL) tackles redundancy in multimodal cancer survival analysis by learning two sets of task-relevant prototypes through learnable queries and cross-attention to bridge high-dimensional histology and genomics representations. A multimodal mixed self-attention module facilitates cross-modal interaction and information fusion, enabling robust survival prediction across five TCGA datasets. On five cancer cohorts, APL achieves a leading average C-index (around 0.72) and outperforms unimodal, multimodal, and prototype-based baselines, with ablation demonstrating the contributions of histology and genomics prototypes and the fusion mechanism. The work provides a practical, data-driven pathway to more accurate prognostic models, with code available at the provided repository for reproducibility.

Abstract

Leveraging multimodal data, particularly the integration of whole-slide histology images (WSIs) and transcriptomic profiles, holds great promise for improving cancer survival prediction. However, excessive redundancy in multimodal data can degrade model performance. In this paper, we propose Adaptive Prototype Learning (APL), a novel and effective approach for multimodal cancer survival analysis. APL adaptively learns representative prototypes in a data-driven manner, reducing redundancy while preserving critical information. Our method employs two sets of learnable query vectors that serve as a bridge between high-dimensional representations and survival prediction, capturing task-relevant features. Additionally, we introduce a multimodal mixed self-attention mechanism to enable cross-modal interactions, further enhancing information fusion. Extensive experiments on five benchmark cancer datasets demonstrate the superiority of our approach over existing methods. The code is available at https://github.com/HongLiuuuuu/APL.

Paper Structure

This paper contains 14 sections, 3 equations, 3 figures, 2 tables.

Figures (3)

  • Figure 1: Illustration of typical multimodal cancer survival analysis architectures: (a) Directly fusing multimodal data through a fusion module, such as an attention mechanism ( e.g., SurvPath survpath). (b) Reducing redundant tokens from cross-modal data using additional knowledge, such as predefined risk levels ( e.g., PIBD pibd). (c) Our proposed approach adaptively learns task-relevant prototypes with learnable queries.
  • Figure 2: Overview of APL. Gene expression is first tokenized into biological pathways, and pathway embeddings are extracted using a feature extractor (SNN snn). Similarly, WSIs are processed into patch embeddings using a pre-trained feature extractor. Next, an adaptive prototyping module employs two sets of learnable queries to extract compact information from high-dimensional representations via cross-attention. These learned prototypes are then fused using a multimodal mixed self-attention mechanism, facilitating cross-modal interactions and enhancing information integration. Finally, the model predicts survival risk based on the refined prototypes.
  • Figure 3: Visualization of APL's behavior, including cross-attention maps and learned prototypes for histology and genomics. (A) WSI of a BLCA patient. (B) Top: Cross-attention maps of two randomly selected histology prototypes, where brighter regions indicate higher relevance. Bottom: The top three most representative patches corresponding to each learned prototype. (C) The top six pathways associated with two randomly selected genomic prototypes. Each histology and genomic prototype is highlighted in a red box.