Table of Contents
Fetching ...

Multi-modal Data Binding for Survival Analysis Modeling with Incomplete Data and Annotations

Linhao Qu, Dan Huang, Shaoting Zhang, Xiaosong Wang

TL;DR

The paper tackles multimodal survival analysis in cancer under incomplete data and censored labels by leveraging modality-specific foundation models to encode diverse modalities, aligning them into a shared embedding space, and using pseudo labels with uncertainty to improve prediction. It introduces a two-tier architecture: (i) attention-based multi-instance aggregation for robust intra- and inter-modality fusion, and (ii) progressive survival disambiguation to estimate hazards for censored patients via soft-label weighting and a time-dependent warming scheme. A contrastive alignment component with memory queues enables cross-modal similarity learning anchored on pathology as a hub, while a joint loss combines contrastive and survival objectives. Empirical results on two real-world datasets demonstrate superior accuracy and clinically meaningful stratification, with strong ablations validating each component’s contribution and the interpretability of modality importance. The work advances feasible, scalable multi-modal survival analysis with incomplete data and provides a foundation for broader clinical deployment and validation.

Abstract

Survival analysis stands as a pivotal process in cancer treatment research, crucial for predicting patient survival rates accurately. Recent advancements in data collection techniques have paved the way for enhancing survival predictions by integrating information from multiple modalities. However, real-world scenarios often present challenges with incomplete data, particularly when dealing with censored survival labels. Prior works have addressed missing modalities but have overlooked incomplete labels, which can introduce bias and limit model efficacy. To bridge this gap, we introduce a novel framework that simultaneously handles incomplete data across modalities and censored survival labels. Our approach employs advanced foundation models to encode individual modalities and align them into a universal representation space for seamless fusion. By generating pseudo labels and incorporating uncertainty, we significantly enhance predictive accuracy. The proposed method demonstrates outstanding prediction accuracy in two survival analysis tasks on both employed datasets. This innovative approach overcomes limitations associated with disparate modalities and improves the feasibility of comprehensive survival analysis using multiple large foundation models.

Multi-modal Data Binding for Survival Analysis Modeling with Incomplete Data and Annotations

TL;DR

The paper tackles multimodal survival analysis in cancer under incomplete data and censored labels by leveraging modality-specific foundation models to encode diverse modalities, aligning them into a shared embedding space, and using pseudo labels with uncertainty to improve prediction. It introduces a two-tier architecture: (i) attention-based multi-instance aggregation for robust intra- and inter-modality fusion, and (ii) progressive survival disambiguation to estimate hazards for censored patients via soft-label weighting and a time-dependent warming scheme. A contrastive alignment component with memory queues enables cross-modal similarity learning anchored on pathology as a hub, while a joint loss combines contrastive and survival objectives. Empirical results on two real-world datasets demonstrate superior accuracy and clinically meaningful stratification, with strong ablations validating each component’s contribution and the interpretability of modality importance. The work advances feasible, scalable multi-modal survival analysis with incomplete data and provides a foundation for broader clinical deployment and validation.

Abstract

Survival analysis stands as a pivotal process in cancer treatment research, crucial for predicting patient survival rates accurately. Recent advancements in data collection techniques have paved the way for enhancing survival predictions by integrating information from multiple modalities. However, real-world scenarios often present challenges with incomplete data, particularly when dealing with censored survival labels. Prior works have addressed missing modalities but have overlooked incomplete labels, which can introduce bias and limit model efficacy. To bridge this gap, we introduce a novel framework that simultaneously handles incomplete data across modalities and censored survival labels. Our approach employs advanced foundation models to encode individual modalities and align them into a universal representation space for seamless fusion. By generating pseudo labels and incorporating uncertainty, we significantly enhance predictive accuracy. The proposed method demonstrates outstanding prediction accuracy in two survival analysis tasks on both employed datasets. This innovative approach overcomes limitations associated with disparate modalities and improves the feasibility of comprehensive survival analysis using multiple large foundation models.
Paper Structure (11 sections, 1 equation, 3 figures, 2 tables)

This paper contains 11 sections, 1 equation, 3 figures, 2 tables.

Figures (3)

  • Figure 1: Overview of the problem definition. (A) Employing multi-modal vision and textual data for patient-wise survival analysis. (B) Missing modality issues during training and testing. (C) Missing accurate label issues for censored patients.
  • Figure 2: (A) Overview of the proposed framework. Solid lines: constant modalities; dashed lines: potentially missing modalities. (B) Diagram of Patient-wise Contrastive Alignment Learning. (C) Diagram of Progressive Survival Disambiguation Learning.
  • Figure 3: The KM analysis curves for (A) OS prediction task and (B) DFS prediction task. (C) the visualization of modal attention scores for three patients.