Table of Contents
Fetching ...

Parameter-free representations outperform single-cell foundation models on downstream benchmarks

Huan Souza, Pankaj Mehta

TL;DR

Using simple, interpretable pipelines that rely on careful normalization and linear methods, SOTA or near SOTA performance is obtained across multiple benchmarks commonly used to evaluate single-cell foundation models, including outperforming foundation models on out-of-distribution tasks involving novel cell types and organisms absent from the training data.

Abstract

Single-cell RNA sequencing (scRNA-seq) data exhibit strong and reproducible statistical structure. This has motivated the development of large-scale foundation models, such as TranscriptFormer, that use transformer-based architectures to learn a generative model for gene expression by embedding genes into a latent vector space. These embeddings have been used to obtain state-of-the-art (SOTA) performance on downstream tasks such as cell-type classification, disease-state prediction, and cross-species learning. Here, we ask whether similar performance can be achieved without utilizing computationally intensive deep learning-based representations. Using simple, interpretable pipelines that rely on careful normalization and linear methods, we obtain SOTA or near SOTA performance across multiple benchmarks commonly used to evaluate single-cell foundation models, including outperforming foundation models on out-of-distribution tasks involving novel cell types and organisms absent from the training data. Our findings highlight the need for rigorous benchmarking and suggest that the biology of cell identity can be captured by simple linear representations of single cell gene expression data.

Parameter-free representations outperform single-cell foundation models on downstream benchmarks

TL;DR

Using simple, interpretable pipelines that rely on careful normalization and linear methods, SOTA or near SOTA performance is obtained across multiple benchmarks commonly used to evaluate single-cell foundation models, including outperforming foundation models on out-of-distribution tasks involving novel cell types and organisms absent from the training data.

Abstract

Single-cell RNA sequencing (scRNA-seq) data exhibit strong and reproducible statistical structure. This has motivated the development of large-scale foundation models, such as TranscriptFormer, that use transformer-based architectures to learn a generative model for gene expression by embedding genes into a latent vector space. These embeddings have been used to obtain state-of-the-art (SOTA) performance on downstream tasks such as cell-type classification, disease-state prediction, and cross-species learning. Here, we ask whether similar performance can be achieved without utilizing computationally intensive deep learning-based representations. Using simple, interpretable pipelines that rely on careful normalization and linear methods, we obtain SOTA or near SOTA performance across multiple benchmarks commonly used to evaluate single-cell foundation models, including outperforming foundation models on out-of-distribution tasks involving novel cell types and organisms absent from the training data. Our findings highlight the need for rigorous benchmarking and suggest that the biology of cell identity can be captured by simple linear representations of single cell gene expression data.
Paper Structure (33 sections, 50 equations, 20 figures)

This paper contains 33 sections, 50 equations, 20 figures.

Figures (20)

  • Figure 1: Downstream Tasks/Benchmarks analyzed in this paper. A. Cross species cell annotation. The goal of this task is to use labeled cells from one species (e.g. humans) to annotate cell types in another species. B. Discrimination between healthy and infected cells. C. Cell type classification. D. Extracting biological context (i.e. gene-TF interactions) from data.
  • Figure 2: Cross-species transfer learning on novel organisms and cell types.A. Pipeline used by parameter free, linear algebra-based method scTOP yampolskaya2023sctop to perform cross-species annotation on the spermatogenesis dataset murat2023molecularpearce2025cross. B. Transfer matrix of macro F1 scores for testis cell type classification across mammals for the TranscriptFormer foundation models TF-Exemplar and TF-Metazoa as reported in pearce2025cross. C. Transfer matrix of macro F1 scores for testis cell type classification using scTOP. D. Comparison between scTOP and foundation models on the cross-species annotation task (F1 scores for foundation models are reported in pearce2025cross).
  • Figure 3: Biological context from data. Cosine similarities for embeddings from TF-Metazoa and scTOP for different male germline developmental lineages and species. A. TranscriptFormer germline; B. TranscriptFormer species; C. scTOP germline; and D. scTOP species. E. Cosine similarity from scTOP between humans and indicated species as a function of evolutionary distance.
  • Figure 4: Tabula Sapiens 2.0 cell type classification task. A: Overview of our pipeline. B: Per-tissue results for Tabula Sapiens 2.0 classification task for pipeline. C: Detailed results and comparisons between our pipeline and foundational models as reported in CZbenchmarks. D: Average result per-tissue-per-type cell classification for Tabula Sapiens 2.0. E: F1-scores distribution per cell type for our pipeline.
  • Figure 5: Identifying cells infected by SARS-CoV-2.A. Schematic illustrating why local classifiers are necessary for this task. B. Comparison between our pipeline (yellow) and foundation models (as reported in pearce2025cross) at classifying SARS-CoV-2 infected and uninfected cell from four distinct donors (data from wu2024interstitial). C. Comparison of average disease state prediction F1 scores of uninfected and infected cells across all tissues and donors for our pipeline and foundation models.
  • ...and 15 more figures