Table of Contents
Fetching ...

A Contrastive Variational AutoEncoder for NSCLC Survival Prediction with Missing Modalities

Michele Zanitti, Vanja Miskovic, Francesco Trovò, Alessandra Laura Giulia Pedrocchi, Ming Shen, Yan Kyaw Tun, Arsela Prelaj, Sokol Kosta

TL;DR

A Multimodal Contrastive Variational AutoEncoder (MCVAE) to address the issue of modality-specific variational encoders that capture the uncertainty in each data source, and a fusion bottleneck with learned gating mechanisms is introduced to normalize the contributions from present modalities.

Abstract

Predicting survival outcomes for non-small cell lung cancer (NSCLC) patients is challenging due to the different individual prognostic features. This task can benefit from the integration of whole-slide images, bulk transcriptomics, and DNA methylation, which offer complementary views of the patient's condition at diagnosis. However, real-world clinical datasets are often incomplete, with entire modalities missing for a significant fraction of patients. State-of-the-art models rely on available data to create patient-level representations or use generative models to infer missing modalities, but they lack robustness in cases of severe missingness. We propose a Multimodal Contrastive Variational AutoEncoder (MCVAE) to address this issue: modality-specific variational encoders capture the uncertainty in each data source, and a fusion bottleneck with learned gating mechanisms is introduced to normalize the contributions from present modalities. We propose a multi-task objective that combines survival loss and reconstruction loss to regularize patient representations, along with a cross-modal contrastive loss that enforces cross-modal alignment in the latent space. During training, we apply stochastic modality masking to improve the robustness to arbitrary missingness patterns. Extensive evaluations on the TCGA-LUAD (n=475) and TCGA-LUSC (n=446) datasets demonstrate the efficacy of our approach in predicting disease-specific survival (DSS) and its robustness to severe missingness scenarios compared to two state-of-the-art models. Finally, we bring some clarifications on multimodal integration by testing our model on all subsets of modalities, finding that integration is not always beneficial to the task.

A Contrastive Variational AutoEncoder for NSCLC Survival Prediction with Missing Modalities

TL;DR

A Multimodal Contrastive Variational AutoEncoder (MCVAE) to address the issue of modality-specific variational encoders that capture the uncertainty in each data source, and a fusion bottleneck with learned gating mechanisms is introduced to normalize the contributions from present modalities.

Abstract

Predicting survival outcomes for non-small cell lung cancer (NSCLC) patients is challenging due to the different individual prognostic features. This task can benefit from the integration of whole-slide images, bulk transcriptomics, and DNA methylation, which offer complementary views of the patient's condition at diagnosis. However, real-world clinical datasets are often incomplete, with entire modalities missing for a significant fraction of patients. State-of-the-art models rely on available data to create patient-level representations or use generative models to infer missing modalities, but they lack robustness in cases of severe missingness. We propose a Multimodal Contrastive Variational AutoEncoder (MCVAE) to address this issue: modality-specific variational encoders capture the uncertainty in each data source, and a fusion bottleneck with learned gating mechanisms is introduced to normalize the contributions from present modalities. We propose a multi-task objective that combines survival loss and reconstruction loss to regularize patient representations, along with a cross-modal contrastive loss that enforces cross-modal alignment in the latent space. During training, we apply stochastic modality masking to improve the robustness to arbitrary missingness patterns. Extensive evaluations on the TCGA-LUAD (n=475) and TCGA-LUSC (n=446) datasets demonstrate the efficacy of our approach in predicting disease-specific survival (DSS) and its robustness to severe missingness scenarios compared to two state-of-the-art models. Finally, we bring some clarifications on multimodal integration by testing our model on all subsets of modalities, finding that integration is not always beneficial to the task.
Paper Structure (35 sections, 14 equations, 4 figures, 2 tables, 1 algorithm)

This paper contains 35 sections, 14 equations, 4 figures, 2 tables, 1 algorithm.

Figures (4)

  • Figure 1: Overview of the proposed MCVAE architecture. From the left: each modality is encoded through a modality-specific variational encoder, producing latent variables $z_k$. The fusion module combines available modality embeddings using availability-aware gating and a fusion network to obtain a shared latent representation $z_{\text{fused}}$. This representation is used for modality reconstruction via decoders and for survival prediction via the survival head. Training optimizes a composite objective that includes task loss (Cox partial likelihood), reconstruction loss, KL divergence, and InfoNCE contrastive loss. The latter pulls together embeddings from different modalities of the same patient (green arrows) while pushing apart embeddings from different patients (red arrows). Loss components are highlighted in purple.
  • Figure 2: Results on survival analysis. C-index and individual results on each fold are reported. The asterisks (*) indicate a significant increase in performance between the two models.
  • Figure 3: Results on survival analysis for LUAD and LUSC cohorts for different modality dropout rates $p_{drop}$. Average C-index (bold markers) and standard deviations (shaded area) are reported.
  • Figure 4: Results on survival analysis for LUAD and LUSC cohorts for different missingness rates $p_{miss}$. Average C-index (bold markers) and standard deviations (shaded area) are reported.