Scalable Amortized GPLVMs for Single Cell Transcriptomics Data
Sarah Zhao, Aditya Ravuri, Vidhi Lalchand, Neil D. Lawrence
TL;DR
This work tackles scalable, interpretable dimensionality reduction for single-cell RNA-seq by advancing Gaussian Process Latent Variable Models (GPLVMs) with amortized stochastic variational inference. It introduces an amortized BGPLVM tailored to scRNA-seq through domain-informed kernels (batch-correction SE-ARD+Linear and cell-cycle PerSE-ARD+Linear) and a data-aware ApproxPoisson likelihood based on library-size normalization, enabling robust clustering and uncertainty quantification. The method achieves performance comparable to the leading scVI approach on synthetic and COVID-19 datasets, while enabling explicit incorporation of prior biological knowledge to obtain more interpretable latent structures. Overall, the framework blends probabilistic modeling with domain knowledge to deliver scalable, interpretable embeddings for large-scale single-cell data, with potential for broader kernel-based customization.
Abstract
Dimensionality reduction is crucial for analyzing large-scale single-cell RNA-seq data. Gaussian Process Latent Variable Models (GPLVMs) offer an interpretable dimensionality reduction method, but current scalable models lack effectiveness in clustering cell types. We introduce an improved model, the amortized stochastic variational Bayesian GPLVM (BGPLVM), tailored for single-cell RNA-seq with specialized encoder, kernel, and likelihood designs. This model matches the performance of the leading single-cell variational inference (scVI) approach on synthetic and real-world COVID datasets and effectively incorporates cell-cycle and batch information to reveal more interpretable latent structures as we demonstrate on an innate immunity dataset.
