Joint Analysis of Single-Cell Data across Cohorts with Missing Modalities
Marianne Arriola, Weishen Pan, Manqi Zhou, Qiannan Zhang, Chang Su, Fei Wang
TL;DR
The paper tackles cross-cohort single-cell multi-omics integration in the presence of missing modalities. It introduces SC^5, a variational topic-modeling framework with a product-of-experts encoder that learns shared latent topics across domains while allowing domain-specific variation. SC^5 supports imputation of entirely unseen modalities in a target domain and improves clustering and classification through neighborhood-aware regularization. Applied to real-world NeurIPS multi-omics data, SC^5 yields more representative embeddings and robust cross-domain imputation, enabling better biological interpretation and downstream tasks.
Abstract
Joint analysis of multi-omic single-cell data across cohorts has significantly enhanced the comprehensive analysis of cellular processes. However, most of the existing approaches for this purpose require access to samples with complete modality availability, which is impractical in many real-world scenarios. In this paper, we propose (Single-Cell Cross-Cohort Cross-Category) integration, a novel framework that learns unified cell representations under domain shift without requiring full-modality reference samples. Our generative approach learns rich cross-modal and cross-domain relationships that enable imputation of these missing modalities. Through experiments on real-world multi-omic datasets, we demonstrate that offers a robust solution to single-cell tasks such as cell type clustering, cell type classification, and feature imputation.
