Table of Contents
Fetching ...

Robust Multimodal Survival Prediction with the Latent Differentiation Conditional Variational AutoEncoder

Junjie Zhou, Jiao Tang, Yingli Zuo, Peng Wan, Daoqiang Zhang, Wei Shao

TL;DR

This work tackles survival prediction by integrating histopathology images and genomic data while addressing missing genomic modalities. It introduces LD-CVAE, a conditional latent differentiation variational autoencoder that learns function-specific genomic embeddings from gigapixel WSIs, augmented by a Variational Information Bottleneck Transformer to encode pathology efficiently. A product-of-experts framework fuses the pathology and reconstructed genomics into a joint latent distribution, guided by an alignment loss to improve cross-modality consistency, and a co-attention-based fusion yields survival predictions. Across five TCGA cancer cohorts, LD-CVAE outperforms unimodal and most multimodal baselines, and remains robust when genomic data are unavailable, highlighting its practical potential for real-world prognostic modeling.

Abstract

The integrative analysis of histopathological images and genomic data has received increasing attention for survival prediction of human cancers. However, the existing studies always hold the assumption that full modalities are available. As a matter of fact, the cost for collecting genomic data is high, which sometimes makes genomic data unavailable in testing samples. A common way of tackling such incompleteness is to generate the genomic representations from the pathology images. Nevertheless, such strategy still faces the following two challenges: (1) The gigapixel whole slide images (WSIs) are huge and thus hard for representation. (2) It is difficult to generate the genomic embeddings with diverse function categories in a unified generative framework. To address the above challenges, we propose a Conditional Latent Differentiation Variational AutoEncoder (LD-CVAE) for robust multimodal survival prediction, even with missing genomic data. Specifically, a Variational Information Bottleneck Transformer (VIB-Trans) module is proposed to learn compressed pathological representations from the gigapixel WSIs. To generate different functional genomic features, we develop a novel Latent Differentiation Variational AutoEncoder (LD-VAE) to learn the common and specific posteriors for the genomic embeddings with diverse functions. Finally, we use the product-of-experts technique to integrate the genomic common posterior and image posterior for the joint latent distribution estimation in LD-CVAE. We test the effectiveness of our method on five different cancer datasets, and the experimental results demonstrate its superiority in both complete and missing modality scenarios.

Robust Multimodal Survival Prediction with the Latent Differentiation Conditional Variational AutoEncoder

TL;DR

This work tackles survival prediction by integrating histopathology images and genomic data while addressing missing genomic modalities. It introduces LD-CVAE, a conditional latent differentiation variational autoencoder that learns function-specific genomic embeddings from gigapixel WSIs, augmented by a Variational Information Bottleneck Transformer to encode pathology efficiently. A product-of-experts framework fuses the pathology and reconstructed genomics into a joint latent distribution, guided by an alignment loss to improve cross-modality consistency, and a co-attention-based fusion yields survival predictions. Across five TCGA cancer cohorts, LD-CVAE outperforms unimodal and most multimodal baselines, and remains robust when genomic data are unavailable, highlighting its practical potential for real-world prognostic modeling.

Abstract

The integrative analysis of histopathological images and genomic data has received increasing attention for survival prediction of human cancers. However, the existing studies always hold the assumption that full modalities are available. As a matter of fact, the cost for collecting genomic data is high, which sometimes makes genomic data unavailable in testing samples. A common way of tackling such incompleteness is to generate the genomic representations from the pathology images. Nevertheless, such strategy still faces the following two challenges: (1) The gigapixel whole slide images (WSIs) are huge and thus hard for representation. (2) It is difficult to generate the genomic embeddings with diverse function categories in a unified generative framework. To address the above challenges, we propose a Conditional Latent Differentiation Variational AutoEncoder (LD-CVAE) for robust multimodal survival prediction, even with missing genomic data. Specifically, a Variational Information Bottleneck Transformer (VIB-Trans) module is proposed to learn compressed pathological representations from the gigapixel WSIs. To generate different functional genomic features, we develop a novel Latent Differentiation Variational AutoEncoder (LD-VAE) to learn the common and specific posteriors for the genomic embeddings with diverse functions. Finally, we use the product-of-experts technique to integrate the genomic common posterior and image posterior for the joint latent distribution estimation in LD-CVAE. We test the effectiveness of our method on five different cancer datasets, and the experimental results demonstrate its superiority in both complete and missing modality scenarios.

Paper Structure

This paper contains 54 sections, 35 equations, 13 figures, 7 tables.

Figures (13)

  • Figure 1: Comparison between (a) traditional and (b) our proposed multimodal survival prediction methods, where the traditional methods require the complete multi-modal data while our method is still effective if the genomic data is missing.
  • Figure 2: The flowchart of the proposed robust multimodal survival prediction model is consisted of three steps. In the multimodal bag representation step, we extract multimodal features from both WSI and genomic data. Then, a conditional latent Differentiation Variational AutoEncoder (LD-CVAE) is proposed to reconstruct the genomic representation from the image data, and thus our method is still effective if the genomic data is missing in the testing stage. Finally, the co-attention module is applied to guide the selection of survival-associated instance, which is then combined with the reconstructed genomic features for survival prediction.
  • Figure 3: The architectures of (a) VIB-Trans and (b) LD-VAE. Both VIB-Trans and LD-VAE obtain posterior parameters by employing their respective transformer encoder with two learnable tokens, i.e.,$\mu^{token}$ and $\Sigma^{token}$. Additionally, LD-VAE further learns the function-specific posteriors from the genomic posterior $\mu$ and $\Sigma$.
  • Figure 4: Kaplan-Meier Analysis of predicted high-risk (red) and low-risk (green) groups on five cancer datasets under both complete modality (top) and missing modality (bottom) scenarios. Shaded areas refer to the confidence intervals.
  • Figure 5: Comparison of the co-attention weights calculated from the genuine (top) and generated (bottom) genomic features.
  • ...and 8 more figures