scTree: Discovering Cellular Hierarchies in the Presence of Batch Effects in scRNA-seq Data
Moritz Vandenhirtz, Florian Barkmann, Laura Manduchi, Julia E. Vogt, Valentina Boeva
TL;DR
This work tackles the challenge of uncovering cellular hierarchies in scRNA-seq data when batch effects obscure true structure. It introduces scTree, an extension of TreeVAE that jointly learns a binary tree latent space and batch-corrected representations in an end-to-end framework, using leaf-specific decoders and a batch offset to model batch effects. A reconstruction loss-based splitting rule enables detection of imbalanced, rare cell types, enabling finer-grained hierarchies. Across seven datasets, scTree achieves competitive or superior clustering and hierarchy quality, particularly in datasets with strong batch effects, and discovers biologically plausible hierarchical structures, with code provided for reproducibility and reuse.
Abstract
We propose a novel method, scTree, for single-cell Tree Variational Autoencoders, extending a hierarchical clustering approach to single-cell RNA sequencing data. scTree corrects for batch effects while simultaneously learning a tree-structured data representation. This VAE-based method allows for a more in-depth understanding of complex cellular landscapes independently of the biasing effects of batches. We show empirically on seven datasets that scTree discovers the underlying clusters of the data and the hierarchical relations between them, as well as outperforms established baseline methods across these datasets. Additionally, we analyze the learned hierarchy to understand its biological relevance, thus underpinning the importance of integrating batch correction directly into the clustering procedure.
