Table of Contents
Fetching ...

Systematic Evaluation of Single-Cell Foundation Model Interpretability Reveals Attention Captures Co-Expression Rather Than Unique Regulatory Signal

Ihor Kendiukhov

TL;DR

The Cell-State Stratified Interpretability (CSSI) addresses an attention-specific scaling failure, improving GRN recovery up to 1.85x and establishes reusable quality-control standards for the field.

Abstract

We present a systematic evaluation framework - thirty-seven analyses, 153 statistical tests, four cell types, two perturbation modalities - for assessing mechanistic interpretability in single-cell foundation models. Applying this framework to scGPT and Geneformer, we find that attention patterns encode structured biological information with layer-specific organisation - protein-protein interactions in early layers, transcriptional regulation in late layers - but this structure provides no incremental value for perturbation prediction: trivial gene-level baselines outperform both attention and correlation edges (AUROC 0.81-0.88 versus 0.70), pairwise edge scores add zero predictive contribution, and causal ablation of regulatory heads produces no degradation. These findings generalise from K562 to RPE1 cells; the attention-correlation relationship is context-dependent, but gene-level dominance is universal. Cell-State Stratified Interpretability (CSSI) addresses an attention-specific scaling failure, improving GRN recovery up to 1.85x. The framework establishes reusable quality-control standards for the field.

Systematic Evaluation of Single-Cell Foundation Model Interpretability Reveals Attention Captures Co-Expression Rather Than Unique Regulatory Signal

TL;DR

The Cell-State Stratified Interpretability (CSSI) addresses an attention-specific scaling failure, improving GRN recovery up to 1.85x and establishes reusable quality-control standards for the field.

Abstract

We present a systematic evaluation framework - thirty-seven analyses, 153 statistical tests, four cell types, two perturbation modalities - for assessing mechanistic interpretability in single-cell foundation models. Applying this framework to scGPT and Geneformer, we find that attention patterns encode structured biological information with layer-specific organisation - protein-protein interactions in early layers, transcriptional regulation in late layers - but this structure provides no incremental value for perturbation prediction: trivial gene-level baselines outperform both attention and correlation edges (AUROC 0.81-0.88 versus 0.70), pairwise edge scores add zero predictive contribution, and causal ablation of regulatory heads produces no degradation. These findings generalise from K562 to RPE1 cells; the attention-correlation relationship is context-dependent, but gene-level dominance is universal. Cell-State Stratified Interpretability (CSSI) addresses an attention-specific scaling failure, improving GRN recovery up to 1.85x. The framework establishes reusable quality-control standards for the field.
Paper Structure (63 sections, 3 equations, 45 figures, 12 tables)

This paper contains 63 sections, 3 equations, 45 figures, 12 tables.

Figures (45)

  • Figure 1: CSSI improves attention-derived GRN recovery. CSSI-max TRRUST F1 versus number of strata ($K$) on DLPFC brain data. Stratified scoring improves recovery up to $1.85\times$ over unstratified baselines, with optimal $K = 5$--$7$.
  • Figure 2: Gene-level baselines outperform pairwise edge scores. AUROC for predicting CRISPRi perturbation targets using gene-level features (variance, mean expression, dropout rate) versus correlation-based edge scores ($n = 151$ perturbations; attention comparison in text). All gene-level baselines significantly outperform both edge types ($p < 10^{-12}$).
  • Figure 3: Pairwise edge scores provide no incremental value beyond gene-level features. (A) Cross-validated AUROC for five model configurations under GroupKFold by perturbation: gene-level features alone, attention only, correlation only, gene-level plus attention, and gene-level plus correlation. Adding pairwise edges provides no improvement. (B) Bootstrap $\Delta$AUROC distribution ($n = 100$) for gene+attention minus gene-only (purple) and gene+correlation minus gene-only (pink); both centred at zero. (C) Stratification by TF versus non-TF perturbation genes shows identical patterns.
  • Figure 4: Causal ablation dose-response. (A) AUROC versus number of ablated heads for TRRUST-ranked (red), composite-ranked (green), and random (blue) ablation. Regulatory heads can be ablated up to $k = 50$ with no significant degradation, while random ablation causes significant drops. (B) All ablation conditions: mean AUROC across 13 conditions versus baseline. (C) Regulatory versus random drop magnitude at each dose level $k$. (D) Cohen's $d$ effect sizes (baseline minus ablation) for all conditions; most are indistinguishable from zero.
  • Figure 5: Context-dependent attention--correlation relationship. Perturbation-first AUROC comparison between Geneformer V2-316M attention and Spearman correlation across cell types. In RPE1 ($n = 1{,}251$), attention significantly outperforms correlation ($d = 0.47$, $p < 10^{-10}$), contrasting with K562 where they are indistinguishable. Sensitivity to LFC threshold shown for RPE1.
  • ...and 40 more figures