Table of Contents
Fetching ...

Benchmarking and optimizing organism wide single-cell RNA alignment methods

Juan Javier Diaz-Mejia, Elias Williams, Octavian Focsa, Dylan Mendonca, Swechha Singh, Brendan Innes, Sam Cooper

TL;DR

This work tackles the problem of evaluating and scaling organism-wide scRNA alignment by introducing the K-Neighbors Intersection (KNI) score, which jointly assesses batch-effect removal and cross-dataset cell-type label accuracy. It couples this metric with two large benchmarks, scMARK (11 studies) and scREF (46 studies), and presents BA-scVI, a Batch Adversarial single-cell Variational Inference model that improves alignment by adversarially penalizing batch effects. BA-scVI consistently outperforms existing methods, preserving cell-type granularity across tissues and technologies, which supports building comprehensive organism-wide maps from a single model. The authors demonstrate that standardized author labels approximate ground-truth cell-type structure and show KNI correlates with meaningful biological generalization, offering a practical framework for scalable, reference-based scRNA integration and atlas construction. $\text{KNI} = \frac{1}{n}\sum_{k=1}^n \mathbf{1}[L(c_k) = t_k]$ captures both batch correction and biological signal in a single score, reinforcing its utility for benchmarking unsupervised scRNA alignment models.

Abstract

Many methods have been proposed for removing batch effects and aligning single-cell RNA (scRNA) datasets. However, performance is typically evaluated based on multiple parameters and few datasets, creating challenges in assessing which method is best for aligning data at scale. Here, we introduce the K-Neighbors Intersection (KNI) score, a single score that both penalizes batch effects and measures accuracy at cross-dataset cell-type label prediction alongside carefully curated small (scMARK) and large (scREF) benchmarks comprising 11 and 46 human scRNA studies respectively, where we have standardized author labels. Using the KNI score, we evaluate and optimize approaches for cross-dataset single-cell RNA integration. We introduce Batch Adversarial single-cell Variational Inference (BA-scVI), as a new variant of scVI that uses adversarial training to penalize batch-effects in the encoder and decoder, and show this approach outperforms other methods. In the resulting aligned space, we find that the granularity of cell-type groupings is conserved, supporting the notion that whole-organism cell-type maps can be created by a single model without loss of information.

Benchmarking and optimizing organism wide single-cell RNA alignment methods

TL;DR

This work tackles the problem of evaluating and scaling organism-wide scRNA alignment by introducing the K-Neighbors Intersection (KNI) score, which jointly assesses batch-effect removal and cross-dataset cell-type label accuracy. It couples this metric with two large benchmarks, scMARK (11 studies) and scREF (46 studies), and presents BA-scVI, a Batch Adversarial single-cell Variational Inference model that improves alignment by adversarially penalizing batch effects. BA-scVI consistently outperforms existing methods, preserving cell-type granularity across tissues and technologies, which supports building comprehensive organism-wide maps from a single model. The authors demonstrate that standardized author labels approximate ground-truth cell-type structure and show KNI correlates with meaningful biological generalization, offering a practical framework for scalable, reference-based scRNA integration and atlas construction. captures both batch correction and biological signal in a single score, reinforcing its utility for benchmarking unsupervised scRNA alignment models.

Abstract

Many methods have been proposed for removing batch effects and aligning single-cell RNA (scRNA) datasets. However, performance is typically evaluated based on multiple parameters and few datasets, creating challenges in assessing which method is best for aligning data at scale. Here, we introduce the K-Neighbors Intersection (KNI) score, a single score that both penalizes batch effects and measures accuracy at cross-dataset cell-type label prediction alongside carefully curated small (scMARK) and large (scREF) benchmarks comprising 11 and 46 human scRNA studies respectively, where we have standardized author labels. Using the KNI score, we evaluate and optimize approaches for cross-dataset single-cell RNA integration. We introduce Batch Adversarial single-cell Variational Inference (BA-scVI), as a new variant of scVI that uses adversarial training to penalize batch-effects in the encoder and decoder, and show this approach outperforms other methods. In the resulting aligned space, we find that the granularity of cell-type groupings is conserved, supporting the notion that whole-organism cell-type maps can be created by a single model without loss of information.

Paper Structure

This paper contains 18 sections, 6 equations, 7 figures, 3 tables.

Figures (7)

  • Figure 1: Comparison of Model Performance on scMARK: a) The KNI score combines accuracy at cell-type labeling with batch-effect correction (KBET) at the data-point level; b) The KNI score is used to assess scRNA alignment model performance at aligning the 11 datasets in scMARK. KNI scores here are plotted for each of the 11 datasets alongside the dataset mean and standard deviation. A perfect score is 1; c) comparison of the scVI and BA-scVI architectures; d) UMAP projections of alignments produced by the eight different methods where cells are colored by author-provided ‘ground-truth’ cell-type label; e) UMAP projections as in (d), but colored by study. The NKT-cell grouping is highlighted to show variation in cell-type alignment quality between the methods notably the Azizi2018-wb study (dark blue) is performed on the in-drop.
  • Figure 2: Comparison of Model Performance on scREF: a) KNI scores were determined for alignment of the 46 study scREF benchmark. Data points correspond to the average score achieved by the model on a study. The average score obtained on the entire benchmark plotted as a line; b) UMAP projections of the BA-scVI aligned scREF benchmark (n=1.27m), colored by ‘ground-truth’ standardized author cell-type label. The legend is omitted for brevity (coloring is the same as , boxes show major cell-type groupings; c) same projection as (b), colored by study name the legend is omitted; d) UMAP projections of scREF embedding spaces for the set of models presented colored by standardized author cell-type labels (left), and study (right).
  • Figure 3: BA-scVI scREF maintains cell-type granularity on alignment: a, b an c) 10-dimensional scRNA embeddings from BA-scVI corresponding to (a) Breast (n=0.4m cells), (b) Brain (n=4.8m cells), and (c) Blood (n=1.6m cells) tissue-types were projected into a 2-dimensional space with UMAP. Cells are colored by study name; d, e and f) The same UMAP projections but colored by original author labels for 3 example studies from each tissue type. Namely, (a) Breast Reed2024-ml (n=0.3m cells), (b) Brain Gabitto2024-pm(n=0.8m cells), and (c) Blood Kock2024-vi (n=1m cells) The cell type and study legends omitted for brevity; major groupings are in boxes.
  • Figure 4: Comparison of evaluation metrics on the scMARK and scREF benchmarks: a) Correlation of KNI scores (x axis) and the KNN scores (y axis) on the studies in the scMARK benchmark achieved by the set of models tested in this paper. Here, the KNN classifier score is calculated using the nearest 50 neighbors from held-out datasets; b) Correlation of the KNI (x axis) score with the kBET score (y axis) Buttner2019-eu; c) Correlation of the kBET score (y axis) with the KNN classifier score (x axis); d, e, and f) the same charts as (a, b, and c) but for models tested on studies in the scREF benchmark; to enable computation, here the KNN classifier score is calculated from the nearest 50 neighbors, less those from the same dataset.
  • Figure 5: Analysis of metric behavior on theoretical example: a) Scatter plots of the three test cases used to compare candidate metrics for evaluating scRNA alignments, the key parameters are separation of the two cell-types by $\phi$ and separation of the two batch effects by $\omega$; b) Under the KNI score, cells that are surrounded by more than $\tau$ cells from the same batch are classified as $null$, this example demonstrates the effect of this, by considering two batches separated by a batch effect $\omega=4$. Cells that are labeled as null are blue, vs. those that would be tested for label accuracy red, two regimes are considered $\tau = 15$ and $\tau = 20$; c) The KmMI score on the ideal case $\phi = 4,\omega = 0$ vs. the case of batch effects $\phi = 4,\omega = 2$ varying cluster number; f) same as (c) examples but considering the KNI score, and varying the cutoff parameter $\tau$ and number of nearest neighbors $k$, the score difference between the two test cases for the two parameters is also plotted (left); g) same as (f) but for the RbNI, where the cutoff percent $\tau*$ and radius $r$ are varied.
  • ...and 2 more figures