Table of Contents
Fetching ...

Structural Prognostic Event Modeling for Multimodal Cancer Survival Analysis

Yilan Zhang, Li Nanbo, Changchun Yang, Jürgen Schmidhuber, Xin Gao

TL;DR

The paper tackles the challenge of predicting cancer survival from high-dimensional multimodal data by focusing on sparse, high-level prognostic events. It introduces SlotSPE, a slot-based framework that compresses histology and genomics into modality-specific slots via slot attention, with selective activation and cross-modal reconstruction guided by biological priors. Across ten TCGA cohorts, SlotSPE achieves state-of-the-art performance in most datasets and demonstrates robustness to missing genomic data. Interpretability analyses reveal event-level alignment between modalities and biologically plausible pathway–morphology correspondences.

Abstract

The integration of histology images and gene profiles has shown great promise for improving survival prediction in cancer. However, current approaches often struggle to model intra- and inter-modal interactions efficiently and effectively due to the high dimensionality and complexity of the inputs. A major challenge is capturing critical prognostic events that, though few, underlie the complexity of the observed inputs and largely determine patient outcomes. These events, manifested as high-level structural signals such as spatial histologic patterns or pathway co-activations, are typically sparse, patient-specific, and unannotated, making them inherently difficult to uncover. To address this, we propose SlotSPE, a slot-based framework for structural prognostic event modeling. Specifically, inspired by the principle of factorial coding, we compress each patient's multimodal inputs into compact, modality-specific sets of mutually distinctive slots using slot attention. By leveraging these slot representations as encodings for prognostic events, our framework enables both efficient and effective modeling of complex intra- and inter-modal interactions, while also facilitating seamless incorporation of biological priors that enhance prognostic relevance. Extensive experiments on ten cancer benchmarks show that SlotSPE outperforms existing methods in 8 out of 10 cohorts, achieving an overall improvement of 2.9%. It remains robust under missing genomic data and delivers markedly improved interpretability through structured event decomposition.

Structural Prognostic Event Modeling for Multimodal Cancer Survival Analysis

TL;DR

The paper tackles the challenge of predicting cancer survival from high-dimensional multimodal data by focusing on sparse, high-level prognostic events. It introduces SlotSPE, a slot-based framework that compresses histology and genomics into modality-specific slots via slot attention, with selective activation and cross-modal reconstruction guided by biological priors. Across ten TCGA cohorts, SlotSPE achieves state-of-the-art performance in most datasets and demonstrates robustness to missing genomic data. Interpretability analyses reveal event-level alignment between modalities and biologically plausible pathway–morphology correspondences.

Abstract

The integration of histology images and gene profiles has shown great promise for improving survival prediction in cancer. However, current approaches often struggle to model intra- and inter-modal interactions efficiently and effectively due to the high dimensionality and complexity of the inputs. A major challenge is capturing critical prognostic events that, though few, underlie the complexity of the observed inputs and largely determine patient outcomes. These events, manifested as high-level structural signals such as spatial histologic patterns or pathway co-activations, are typically sparse, patient-specific, and unannotated, making them inherently difficult to uncover. To address this, we propose SlotSPE, a slot-based framework for structural prognostic event modeling. Specifically, inspired by the principle of factorial coding, we compress each patient's multimodal inputs into compact, modality-specific sets of mutually distinctive slots using slot attention. By leveraging these slot representations as encodings for prognostic events, our framework enables both efficient and effective modeling of complex intra- and inter-modal interactions, while also facilitating seamless incorporation of biological priors that enhance prognostic relevance. Extensive experiments on ten cancer benchmarks show that SlotSPE outperforms existing methods in 8 out of 10 cohorts, achieving an overall improvement of 2.9%. It remains robust under missing genomic data and delivers markedly improved interpretability through structured event decomposition.

Paper Structure

This paper contains 44 sections, 19 equations, 14 figures, 10 tables.

Figures (14)

  • Figure 1: Framework of SlotSPE. Histology and gene features are extracted into bag structures, then compressed into slots via slot attention. Selective slot activation enforces sparsity and mutual competition, while a biologically guided cross-modal reconstruction aligns modalities. Finally, slot interactions are modeled using self- and cross-attention to predict survival.
  • Figure 2: Detailed component structure of SlotSPE. (a) Selective slot activation: a Mixture-of-Experts(MoE)–style decoder sparsely activates only the most predictive slots. (b) Cross-modal reconstruction: guided by biological priors, omics-derived slots are aligned with histology by predicting pathway-level gene expression from WSIs.
  • Figure 3: Kaplan--Meier curves of predicted high-risk and low-risk groups. A $p$-value $< 0.05$ at the top indicates statistically significant separation between groups. The restricted mean survival time (RMST) up to 60 months is also reported, with values shown as $\Delta$ (High--Low), and ratio (High/Low). (Zoom in to view details.)
  • Figure 4: Performance vs (Inference) Memory/Runtime Trade-off
  • Figure 5: Interpretability of slots. (A) Original WSI, assignment map of histology-derived slots, and assignment map of omics-derived slots. (B) Attention maps for each slot with the top-5 most relevant patches highlighted. (C) Top-3 relevant and irrelevant pathways identified per slot.
  • ...and 9 more figures