Table of Contents
Fetching ...

GRIT: Graph-Regularized Logit Refinement for Zero-shot Cell Type Annotation

Tianxiang Hu, Chenyi Zhou, Jiaxiang Liu, Jiongxin Wang, Ruizhe Chen, Haoxiang Xia, Gaoang Wang, Jian Wu, Zuozhu Liu

TL;DR

GRIT provides a training-free, inference-time refinement for zero-shot cell type annotation by enforcing local consistency of CLIP-style logits on a PCA-based $k$-NN graph. The method solves a convex graph-regularized objective with closed-form $\hat{P}_{\lambda} = (I + \lambda L)^{-1} P_0$, blending scalable foundation-model predictions with graph-structured robustness. Across 14 scRNA-seq datasets (11 Tabula Sapiens organs plus PBMCs and Peripheral Cortex), GRIT yields consistent accuracy gains up to $\approx 10\%$ and macro F1 improvements, while remaining robust to hyperparameters and graph choices. This lightweight, model-agnostic post-processing step enhances zero-shot annotation without additional training, offering a practical plug-in for scalable cell type inference in single-cell analyses.

Abstract

Cell type annotation is a fundamental step in the analysis of single-cell RNA sequencing (scRNA-seq) data. In practice, human experts often rely on the structure revealed by principal component analysis (PCA) followed by $k$-nearest neighbor ($k$-NN) graph construction to guide annotation. While effective, this process is labor-intensive and does not scale to large datasets. Recent advances in CLIP-style models offer a promising path toward automating cell type annotation. By aligning scRNA-seq profiles with natural language descriptions, models like LangCell enable zero-shot annotation. While LangCell demonstrates decent zero-shot performance, its predictions remain suboptimal. In this paper, we propose a principled inference-time paradigm for zero-shot cell type annotation (GRIT) which bridges the scalability of pre-trained foundation models with the structural robustness relied upon in human expert annotation workflows. Specifically, we enforce local consistency of the zero-shot CLIP logits over the task-specific PCA-based $k$-NN graph. We evaluate our approach on 14 annotated human scRNA-seq datasets from 4 distinct studies, spanning 11 organs and over 200,000 single cells. Our method consistently improves zero-shot annotation accuracy, achieving accuracy gains of up to 10\%. Further analysis showcase the mechanism by which GRIT effectively propagates correct signals through the graph, pulling back mislabeled cells toward more accurate predictions. The method is training-free, model-agnostic, and serves as a simple yet effective plug-in for enhancing zero-shot cell type annotation.

GRIT: Graph-Regularized Logit Refinement for Zero-shot Cell Type Annotation

TL;DR

GRIT provides a training-free, inference-time refinement for zero-shot cell type annotation by enforcing local consistency of CLIP-style logits on a PCA-based -NN graph. The method solves a convex graph-regularized objective with closed-form , blending scalable foundation-model predictions with graph-structured robustness. Across 14 scRNA-seq datasets (11 Tabula Sapiens organs plus PBMCs and Peripheral Cortex), GRIT yields consistent accuracy gains up to and macro F1 improvements, while remaining robust to hyperparameters and graph choices. This lightweight, model-agnostic post-processing step enhances zero-shot annotation without additional training, offering a practical plug-in for scalable cell type inference in single-cell analyses.

Abstract

Cell type annotation is a fundamental step in the analysis of single-cell RNA sequencing (scRNA-seq) data. In practice, human experts often rely on the structure revealed by principal component analysis (PCA) followed by -nearest neighbor (-NN) graph construction to guide annotation. While effective, this process is labor-intensive and does not scale to large datasets. Recent advances in CLIP-style models offer a promising path toward automating cell type annotation. By aligning scRNA-seq profiles with natural language descriptions, models like LangCell enable zero-shot annotation. While LangCell demonstrates decent zero-shot performance, its predictions remain suboptimal. In this paper, we propose a principled inference-time paradigm for zero-shot cell type annotation (GRIT) which bridges the scalability of pre-trained foundation models with the structural robustness relied upon in human expert annotation workflows. Specifically, we enforce local consistency of the zero-shot CLIP logits over the task-specific PCA-based -NN graph. We evaluate our approach on 14 annotated human scRNA-seq datasets from 4 distinct studies, spanning 11 organs and over 200,000 single cells. Our method consistently improves zero-shot annotation accuracy, achieving accuracy gains of up to 10\%. Further analysis showcase the mechanism by which GRIT effectively propagates correct signals through the graph, pulling back mislabeled cells toward more accurate predictions. The method is training-free, model-agnostic, and serves as a simple yet effective plug-in for enhancing zero-shot cell type annotation.

Paper Structure

This paper contains 19 sections, 1 theorem, 3 equations, 6 figures, 4 tables.

Key Result

Theorem 1

Given a symmetric graph Laplacian $L \in \mathbb{R}^{n \times n}$ constructed from an adjacency matrix $A \in \mathbb{R}^{n \times n}$, let $P_0 \in \mathbb{R}^{n \times c}$ denote the initial class logits over the $n$ nodes, and let $P^* \in \mathbb{R}^{n \times c}$ denote the ground-truth logits. Suppose the following condition holds: Then there exists a sufficiently small regularization param

Figures (6)

  • Figure 1: GRIT Overview. (a,b) Existing approaches rely on expert-driven labeling or deep learning models, which can be labor-intensive or imprecise. (c) Construct a PCA-based $k$-NN graph, with each node initialized by the logits predicted by a deep learning model. (d) GRIT refines these initial predictions by solving a graph-regularized optimization problem that promotes local consistency across the $k$-NN graph.
  • Figure 2: Performance of GRIT on zero-shot cell type annotation across 11 organ-specific scRNA-seq datasets. Each segment shows the accuracy achieved by GRIT (blue) and the baseline LangCell (green), along with the accuracy gain labeled in red. GRIT consistently improves performance over LangCell across all datasets, with accuracy gains up to 10%. Detailed results and analysis are provided in Section \ref{['sec:main results']}.
  • Figure 3: Overview of the human scRNA-seq datasets used in our main experiment. They span 11 human organs, 76 annotated cell types, and over 171,000 single cells. The anatomical illustration summarizes cell type diversity across organs. The bar chart reports single-cell counts per organ. The circular plot visualizes the distribution of all cell types, indexed from 1 to 76 for clarity. Full cell type names corresponding to these indices are listed in Appendix.
  • Figure 4: Investigation of GRIT performance in the right-hand neighborhood of $\lambda = 0$ across all 11 organs from the Tabula Sapiens project and their average. The x-axis denotes $\lambda$ values, and the y-axis reports logit performance measured by accuracy. Dashed lines indicate baseline accuracies achieved by LangCell.
  • Figure 5: UMAP visualization of scRNA-seq data of organ uterus (left), muscle (middle), and kidney (right). Each point represents a cell, colored by prediction correctness: gray indicates correct predictions, and green indicates incorrect ones. For each organ, the top panel shows LangCell zero-shot predictions, the bottom panel shows refined predictions from GRIT. Orange boxes indicate representative regions where GRIT provides clear improvements.
  • ...and 1 more figures

Theorems & Definitions (2)

  • Theorem 1: Graph Regularized Logit Refinement Improves Predictions
  • Remark 1