Table of Contents
Fetching ...

Together, Then Apart: Revisiting Multimodal Survival Analysis via a Min-Max Perspective

Wenjing Liu, Qin Ren, Wen Zhang, Yuewei Lin, Chenyu You

TL;DR

This work revisits multi-modal survival analysis via the dual lens of alignment and distinctiveness, positing that preserving modality-specific structure is as vital as achieving semantic coherence in for robust, interpretable, and biologically meaningful multi-modal survival analysis.

Abstract

Integrating heterogeneous modalities such as histopathology and genomics is central to advancing survival analysis, yet most existing methods prioritize cross-modal alignment through attention-based fusion mechanisms, often at the expense of modality-specific characteristics. This overemphasis on alignment leads to representation collapse and reduced diversity. In this work, we revisit multi-modal survival analysis via the dual lens of alignment and distinctiveness, positing that preserving modality-specific structure is as vital as achieving semantic coherence. In this paper, we introduce Together-Then-Apart (TTA), a unified min-max optimization framework that simultaneously models shared and modality-specific representations. The Together stage minimizes semantic discrepancies by aligning embeddings via shared prototypes, guided by an unbalanced optimal transport objective that adaptively highlights informative tokens. The Apart stage maximizes representational diversity through modality anchors and a contrastive regularizer that preserve unique modality information and prevent feature collapse. Extensive experiments on five TCGA benchmarks show that TTA consistently outperforms state-of-the-art methods. Beyond empirical gains, our formulation provides a new theoretical perspective of how alignment and distinctiveness can be jointly achieved in for robust, interpretable, and biologically meaningful multi-modal survival analysis.

Together, Then Apart: Revisiting Multimodal Survival Analysis via a Min-Max Perspective

TL;DR

This work revisits multi-modal survival analysis via the dual lens of alignment and distinctiveness, positing that preserving modality-specific structure is as vital as achieving semantic coherence in for robust, interpretable, and biologically meaningful multi-modal survival analysis.

Abstract

Integrating heterogeneous modalities such as histopathology and genomics is central to advancing survival analysis, yet most existing methods prioritize cross-modal alignment through attention-based fusion mechanisms, often at the expense of modality-specific characteristics. This overemphasis on alignment leads to representation collapse and reduced diversity. In this work, we revisit multi-modal survival analysis via the dual lens of alignment and distinctiveness, positing that preserving modality-specific structure is as vital as achieving semantic coherence. In this paper, we introduce Together-Then-Apart (TTA), a unified min-max optimization framework that simultaneously models shared and modality-specific representations. The Together stage minimizes semantic discrepancies by aligning embeddings via shared prototypes, guided by an unbalanced optimal transport objective that adaptively highlights informative tokens. The Apart stage maximizes representational diversity through modality anchors and a contrastive regularizer that preserve unique modality information and prevent feature collapse. Extensive experiments on five TCGA benchmarks show that TTA consistently outperforms state-of-the-art methods. Beyond empirical gains, our formulation provides a new theoretical perspective of how alignment and distinctiveness can be jointly achieved in for robust, interpretable, and biologically meaningful multi-modal survival analysis.

Paper Structure

This paper contains 27 sections, 57 equations, 11 figures, 9 tables, 2 algorithms.

Figures (11)

  • Figure 1: Comparison between (a) traditional and (b) our proposed TTA. Our method formulates the task as a min–max optimization with two complementary stages: Together, for semantic alignment, and Apart, for representational diversification.
  • Figure 2: Overview of TTA. (1) Pre-processing: Whole-slide images and gene-expression profiles are partitioned into modality-specific tokens. (2) Together stage: Modality tokens are aligned to a shared prototype bank using a semi-relaxed unbalanced optimal transport module, guided by a curriculum on the mass parameter $\rho$ and solved with a scaling algorithm. (3) Apart stage: Modality-weighted tokens are refined using modality-specific anchors, regularized by a contrastive objective to preserve modality distinctiveness. (4) Fusion: A transformer-based co-attention module reconcile the two modality representations for survival prediction.
  • Figure 3: Kaplan-Meier Curves of predicted high risk (red) and low-risk (blue) groups. A p-value $<0.05$ indicates statistical significance, and the shared regions represent the confident intervals.
  • Figure 4: Ablations and hyperparameter experiments in the Together stage: multi-head, instance-to-prototype Assignment, number of shared prototypes.
  • Figure 5: Hyperparameter experiments in the Together stage: KL-constraint weight and instance loss weight.
  • ...and 6 more figures