scMRDR: A scalable and flexible framework for unpaired single-cell multi-omics data integration
Jianle Sun, Chaoqi Liang, Ran Wei, Peng Zheng, Lei Bai, Wanli Ouyang, Hongliang Yan, Peng Ye
TL;DR
scMRDR addresses unpaired single-cell multi-omics integration by learning a unified latent space via a single encoder-decoder $eta$-VAE that disentangles modality-shared $z_u$ and modality-specific $z_s^{(m)}$ components. It imposes isometric regularization to preserve intra-modality structure, adversarial alignment across omics, and a masked reconstruction loss to handle missing features, enabling scalability to more than two omics. Empirical results demonstrate strong batch correction, modality alignment, and biological signal preservation on two-omics and triple-omics benchmarks, and show practical utility in spatial-omics contexts through spatial-location imputation. Overall, scMRDR offers a flexible, scalable framework for large-scale multi-omics integration and downstream biological discovery.
Abstract
Advances in single-cell sequencing have enabled high-resolution profiling of diverse molecular modalities, while integrating unpaired multi-omics single-cell data remains challenging. Existing approaches either rely on pair information or prior correspondences, or require computing a global pairwise coupling matrix, limiting their scalability and flexibility. In this paper, we introduce a scalable and flexible generative framework called single-cell Multi-omics Regularized Disentangled Representations (scMRDR) for unpaired multi-omics integration. Specifically, we disentangle each cell's latent representations into modality-shared and modality-specific components using a well-designed $β$-VAE architecture, which are augmented with isometric regularization to preserve intra-omics biological heterogeneity, adversarial objective to encourage cross-modal alignment, and masked reconstruction loss strategy to address the issue of missing features across modalities. Our method achieves excellent performance on benchmark datasets in terms of batch correction, modality alignment, and biological signal preservation. Crucially, it scales effectively to large-level datasets and supports integration of more than two omics, offering a powerful and flexible solution for large-scale multi-omics data integration and downstream biological discovery.
