Table of Contents
Fetching ...

Joint Analysis of Single-Cell Data across Cohorts with Missing Modalities

Marianne Arriola, Weishen Pan, Manqi Zhou, Qiannan Zhang, Chang Su, Fei Wang

TL;DR

The paper tackles cross-cohort single-cell multi-omics integration in the presence of missing modalities. It introduces SC^5, a variational topic-modeling framework with a product-of-experts encoder that learns shared latent topics across domains while allowing domain-specific variation. SC^5 supports imputation of entirely unseen modalities in a target domain and improves clustering and classification through neighborhood-aware regularization. Applied to real-world NeurIPS multi-omics data, SC^5 yields more representative embeddings and robust cross-domain imputation, enabling better biological interpretation and downstream tasks.

Abstract

Joint analysis of multi-omic single-cell data across cohorts has significantly enhanced the comprehensive analysis of cellular processes. However, most of the existing approaches for this purpose require access to samples with complete modality availability, which is impractical in many real-world scenarios. In this paper, we propose (Single-Cell Cross-Cohort Cross-Category) integration, a novel framework that learns unified cell representations under domain shift without requiring full-modality reference samples. Our generative approach learns rich cross-modal and cross-domain relationships that enable imputation of these missing modalities. Through experiments on real-world multi-omic datasets, we demonstrate that offers a robust solution to single-cell tasks such as cell type clustering, cell type classification, and feature imputation.

Joint Analysis of Single-Cell Data across Cohorts with Missing Modalities

TL;DR

The paper tackles cross-cohort single-cell multi-omics integration in the presence of missing modalities. It introduces SC^5, a variational topic-modeling framework with a product-of-experts encoder that learns shared latent topics across domains while allowing domain-specific variation. SC^5 supports imputation of entirely unseen modalities in a target domain and improves clustering and classification through neighborhood-aware regularization. Applied to real-world NeurIPS multi-omics data, SC^5 yields more representative embeddings and robust cross-domain imputation, enabling better biological interpretation and downstream tasks.

Abstract

Joint analysis of multi-omic single-cell data across cohorts has significantly enhanced the comprehensive analysis of cellular processes. However, most of the existing approaches for this purpose require access to samples with complete modality availability, which is impractical in many real-world scenarios. In this paper, we propose (Single-Cell Cross-Cohort Cross-Category) integration, a novel framework that learns unified cell representations under domain shift without requiring full-modality reference samples. Our generative approach learns rich cross-modal and cross-domain relationships that enable imputation of these missing modalities. Through experiments on real-world multi-omic datasets, we demonstrate that offers a robust solution to single-cell tasks such as cell type clustering, cell type classification, and feature imputation.
Paper Structure (30 sections, 18 equations, 7 figures, 5 tables)

This paper contains 30 sections, 18 equations, 7 figures, 5 tables.

Figures (7)

  • Figure 1: Problem overview. Left: Different cases of modality availability. Right: Different cases of modality imputation.
  • Figure 2: Graphical illustration of generative model.
  • Figure 3: Model overview. Top: The encoder integrates the available modalities for each domain via the product-of-experts (PoE). The decoder reconstructs the modality-specific features by topic and feature embeddings which capture global, domain-dependent, and modality-dependent variation. Bottom: Integrated feature representations are obtained by using global and domain-dependent embeddings.
  • Figure 4: Domains and modality availability under each experiment scenario.
  • Figure 5: Distribution of topic assignment scores in the $Combine$ setting. The top 20% of assigned topics are selected across 500 sampled cells from 2 domains. Color intensity values correspond to the cell-topic feature value before normalization into a topic mixture probability. Boxed topic features correspond to cell topic features with strong association to unique cell types.
  • ...and 2 more figures