Deep Learning in Single-Cell and Spatial Transcriptomics Data Analysis: Advances and Challenges from a Data Science Perspective
Shuang Ge, Shuqing Sun, Huan Xu, Qiang Cheng, Zhixiang Ren
TL;DR
This paper surveys deep learning approaches for single-cell and spatial transcriptomics from a data science perspective, focusing on four core challenges: data sparsity, diversity, scarcity, and correlation. It analyzes DL methods across data representations, multimodal/multi-source integration, data generation, and prior-knowledge incorporation, and benchmarks 58 methods on 21 datasets from 9 benchmarks. The authors also curate datasets and propose evaluation strategies, highlighting gaps in benchmark design and the need for biologically relevant metrics. They foresee advances from novel AI paradigms (foundation models, self-supervised learning) and improved benchmarks, with practical impact on biology and precision medicine.
Abstract
The development of single-cell and spatial transcriptomics has revolutionized our capacity to investigate cellular properties, functions, and interactions in both cellular and spatial contexts. However, the analysis of single-cell and spatial omics data remains challenging. First, single-cell sequencing data are high-dimensional and sparse, often contaminated by noise and uncertainty, obscuring the underlying biological signals. Second, these data often encompass multiple modalities, including gene expression, epigenetic modifications, and spatial locations. Integrating these diverse data modalities is crucial for enhancing prediction accuracy and biological interpretability. Third, while the scale of single-cell sequencing has expanded to millions of cells, high-quality annotated datasets are still limited. Fourth, the complex correlations of biological tissues make it difficult to accurately reconstruct cellular states and spatial contexts. Traditional feature engineering-based analysis methods struggle to deal with the various challenges presented by intricate biological networks. Deep learning has emerged as a powerful tool capable of handling high-dimensional complex data and automatically identifying meaningful patterns, offering significant promise in addressing these challenges. This review systematically analyzes these challenges and discusses related deep learning approaches. Moreover, we have curated 21 datasets from 9 benchmarks, encompassing 58 computational methods, and evaluated their performance on the respective modeling tasks. Finally, we highlight three areas for future development from a technical, dataset, and application perspective. This work will serve as a valuable resource for understanding how deep learning can be effectively utilized in single-cell and spatial transcriptomics analyses, while inspiring novel approaches to address emerging challenges.
