Scalable Amortized GPLVMs for Single Cell Transcriptomics Data

Sarah Zhao; Aditya Ravuri; Vidhi Lalchand; Neil D. Lawrence

Scalable Amortized GPLVMs for Single Cell Transcriptomics Data

Sarah Zhao, Aditya Ravuri, Vidhi Lalchand, Neil D. Lawrence

TL;DR

This work tackles scalable, interpretable dimensionality reduction for single-cell RNA-seq by advancing Gaussian Process Latent Variable Models (GPLVMs) with amortized stochastic variational inference. It introduces an amortized BGPLVM tailored to scRNA-seq through domain-informed kernels (batch-correction SE-ARD+Linear and cell-cycle PerSE-ARD+Linear) and a data-aware ApproxPoisson likelihood based on library-size normalization, enabling robust clustering and uncertainty quantification. The method achieves performance comparable to the leading scVI approach on synthetic and COVID-19 datasets, while enabling explicit incorporation of prior biological knowledge to obtain more interpretable latent structures. Overall, the framework blends probabilistic modeling with domain knowledge to deliver scalable, interpretable embeddings for large-scale single-cell data, with potential for broader kernel-based customization.

Abstract

Dimensionality reduction is crucial for analyzing large-scale single-cell RNA-seq data. Gaussian Process Latent Variable Models (GPLVMs) offer an interpretable dimensionality reduction method, but current scalable models lack effectiveness in clustering cell types. We introduce an improved model, the amortized stochastic variational Bayesian GPLVM (BGPLVM), tailored for single-cell RNA-seq with specialized encoder, kernel, and likelihood designs. This model matches the performance of the leading single-cell variational inference (scVI) approach on synthetic and real-world COVID datasets and effectively incorporates cell-cycle and batch information to reveal more interpretable latent structures as we demonstrate on an innate immunity dataset.

Scalable Amortized GPLVMs for Single Cell Transcriptomics Data

TL;DR

Abstract

Paper Structure (29 sections, 26 equations, 7 figures, 3 tables)

This paper contains 29 sections, 26 equations, 7 figures, 3 tables.

Introduction
Background
Amortized Stochastic Variational Bayesian GPLVM
Encoding Domain Knowledge through Kernels
Batch correction kernel formulation
Cell-cycle phase kernel
Our model
Pre-Processing and Likelihood
Encoder
Results and Discussion
Each component is crucial to modified model performance
Modified model achieves significant improvements over standard Bayesian GPLVM and is comparable to scVI
Consistency of Latent space with biological factors
Conclusion
Baseline Models
...and 14 more sections

Figures (7)

Figure 1: Overview of Modified BGPLVM Model
Figure 2: Ablation study with the simulated dataset on the proposed BGPLVM model where we change one component at a time (labeled in subfigures) and visualize the resulting UMAPs. The top row is colored by cell-type and the bottom row by batch.
Figure 3: UMAPs generated from the latent spaces of four models: an implementation of the original BGPLVM, the modified BGPLVM for scRNA-seq data, scVI, and a linear decoder scVI (LDVAE) for the COVID data set. The top row is color/shaded by cell type and the bottom by batch.
Figure 4: (Top row) Plots of log means and log variances (both parametrized by the same GP) versus learned cell-cycle pseudotime dimension for three specific genes (UBE2C, CDC6, FN1). The squares depict log variances and the circles depict log means of the library normalized data, both colored by the phases annotated in kumasaka2021mapping_innateimmunity. We see that our model's learned cell-cycle phases correspond roughly to the phases labelled in kumasaka2021mapping_innateimmunity. (Bottom row) UMAP plots of our model's learned latent space excluding directions identified with hidden technical effects (e.g. batch and plate border effects). Cells are colored by treatment condition (left), primary (middle) and secondary (right) pseudotime directions.
Figure 5: Overview of the scVI architecture adapted from lopez2018deep_scvi.
...and 2 more figures

Scalable Amortized GPLVMs for Single Cell Transcriptomics Data

TL;DR

Abstract

Scalable Amortized GPLVMs for Single Cell Transcriptomics Data

Authors

TL;DR

Abstract

Table of Contents

Figures (7)