Benchmarking and optimizing organism wide single-cell RNA alignment methods

Juan Javier Diaz-Mejia; Elias Williams; Octavian Focsa; Dylan Mendonca; Swechha Singh; Brendan Innes; Sam Cooper

Benchmarking and optimizing organism wide single-cell RNA alignment methods

Juan Javier Diaz-Mejia, Elias Williams, Octavian Focsa, Dylan Mendonca, Swechha Singh, Brendan Innes, Sam Cooper

TL;DR

This work tackles the problem of evaluating and scaling organism-wide scRNA alignment by introducing the K-Neighbors Intersection (KNI) score, which jointly assesses batch-effect removal and cross-dataset cell-type label accuracy. It couples this metric with two large benchmarks, scMARK (11 studies) and scREF (46 studies), and presents BA-scVI, a Batch Adversarial single-cell Variational Inference model that improves alignment by adversarially penalizing batch effects. BA-scVI consistently outperforms existing methods, preserving cell-type granularity across tissues and technologies, which supports building comprehensive organism-wide maps from a single model. The authors demonstrate that standardized author labels approximate ground-truth cell-type structure and show KNI correlates with meaningful biological generalization, offering a practical framework for scalable, reference-based scRNA integration and atlas construction. $\text{KNI} = \frac{1}{n}\sum_{k=1}^n \mathbf{1}[L(c_k) = t_k]$ captures both batch correction and biological signal in a single score, reinforcing its utility for benchmarking unsupervised scRNA alignment models.

Abstract

Many methods have been proposed for removing batch effects and aligning single-cell RNA (scRNA) datasets. However, performance is typically evaluated based on multiple parameters and few datasets, creating challenges in assessing which method is best for aligning data at scale. Here, we introduce the K-Neighbors Intersection (KNI) score, a single score that both penalizes batch effects and measures accuracy at cross-dataset cell-type label prediction alongside carefully curated small (scMARK) and large (scREF) benchmarks comprising 11 and 46 human scRNA studies respectively, where we have standardized author labels. Using the KNI score, we evaluate and optimize approaches for cross-dataset single-cell RNA integration. We introduce Batch Adversarial single-cell Variational Inference (BA-scVI), as a new variant of scVI that uses adversarial training to penalize batch-effects in the encoder and decoder, and show this approach outperforms other methods. In the resulting aligned space, we find that the granularity of cell-type groupings is conserved, supporting the notion that whole-organism cell-type maps can be created by a single model without loss of information.

Benchmarking and optimizing organism wide single-cell RNA alignment methods

TL;DR

Abstract

Benchmarking and optimizing organism wide single-cell RNA alignment methods

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (7)