Table of Contents
Fetching ...

Discovering Interpretable Biological Concepts in Single-cell RNA-seq Foundation Models

Charlotte Claye, Pierre Marschall, Wassila Ouerdane, Céline Hudelot, Julien Duquesne

TL;DR

This work tackles the opacity of single-cell RNA-seq foundation models by proposing a concept-based interpretability framework that decomposes latent representations via Top-K Sparse Auto-Encoders. It introduces attribution with counterfactual perturbations to identify genes driving concept activation, coupled with expert visualization and attribution-based GSEA to map concepts to biology. The framework yields concepts that are more interpretable than individual neurons while preserving biological signal, with some concepts stable across datasets and useful for downstream tasks such as cell type classification. This approach enables hypothesis generation and discovery by linking latent model knowledge to interpretable biological signals and pathways.

Abstract

Single-cell RNA-seq foundation models achieve strong performance on downstream tasks but remain black boxes, limiting their utility for biological discovery. Recent work has shown that sparse dictionary learning can extract concepts from deep learning models, with promising applications in biomedical imaging and protein models. However, interpreting biological concepts remains challenging, as biological sequences are not inherently human-interpretable. We introduce a novel concept-based interpretability framework for single-cell RNA-seq models with a focus on concept interpretation and evaluation. We propose an attribution method with counterfactual perturbations that identifies genes that influence concept activation, moving beyond correlational approaches like differential expression analysis. We then provide two complementary interpretation approaches: an expert-driven analysis facilitated by an interactive interface and an ontology-driven method with attribution-based biological pathway enrichment. Applying our framework to two well-known single-cell RNA-seq models from the literature, we interpret concepts extracted by Top-K Sparse Auto-Encoders trained on two immune cell datasets. With a domain expert in immunology, we show that concepts improve interpretability compared to individual neurons while preserving the richness and informativeness of the latent representations. This work provides a principled framework for interpreting what biological knowledge foundation models have encoded, paving the way for their use for hypothesis generation and discovery.

Discovering Interpretable Biological Concepts in Single-cell RNA-seq Foundation Models

TL;DR

This work tackles the opacity of single-cell RNA-seq foundation models by proposing a concept-based interpretability framework that decomposes latent representations via Top-K Sparse Auto-Encoders. It introduces attribution with counterfactual perturbations to identify genes driving concept activation, coupled with expert visualization and attribution-based GSEA to map concepts to biology. The framework yields concepts that are more interpretable than individual neurons while preserving biological signal, with some concepts stable across datasets and useful for downstream tasks such as cell type classification. This approach enables hypothesis generation and discovery by linking latent model knowledge to interpretable biological signals and pathways.

Abstract

Single-cell RNA-seq foundation models achieve strong performance on downstream tasks but remain black boxes, limiting their utility for biological discovery. Recent work has shown that sparse dictionary learning can extract concepts from deep learning models, with promising applications in biomedical imaging and protein models. However, interpreting biological concepts remains challenging, as biological sequences are not inherently human-interpretable. We introduce a novel concept-based interpretability framework for single-cell RNA-seq models with a focus on concept interpretation and evaluation. We propose an attribution method with counterfactual perturbations that identifies genes that influence concept activation, moving beyond correlational approaches like differential expression analysis. We then provide two complementary interpretation approaches: an expert-driven analysis facilitated by an interactive interface and an ontology-driven method with attribution-based biological pathway enrichment. Applying our framework to two well-known single-cell RNA-seq models from the literature, we interpret concepts extracted by Top-K Sparse Auto-Encoders trained on two immune cell datasets. With a domain expert in immunology, we show that concepts improve interpretability compared to individual neurons while preserving the richness and informativeness of the latent representations. This work provides a principled framework for interpreting what biological knowledge foundation models have encoded, paving the way for their use for hypothesis generation and discovery.

Paper Structure

This paper contains 36 sections, 3 equations, 10 figures, 4 tables.

Figures (10)

  • Figure 1: Illustration of the methodology to extract and interpret biological concepts from scRNA-seq models. (1) Concepts are extracted by training Topk SAEs on two scRNA-seq datasets. (2) We introduce a set of methods to biologically interpret concepts. (A) Characteristics of the cell population that activate a concept based on available metadata per cell. (B) Attribution method based on counterfactual perturbations to score genes according to their importance for concept activation. (C) Expert interpretation of the concept based on the gene attribution results and prior knowledge. We developed and deployed a visualization tool to facilitate manual interpretation. (D) Attribution-based pathway enrichment detects pathways enriched with genes that influence concept activation.
  • Figure 2: Evaluation of Topk SAEs trained at different scales. Results for SAEs trained with the Tabula Sapiens Immune dataset, for scGPT (top) and scVI (bottom). (A) Cell embedding reconstruction quality as measured by the $R^2$ score. (B) Concepts characteristics at the cell level based on metadata. (C) Gene set characteristics based on attribution results ($attribution>0.05$). Not enough data means that less than 100 cells activate the concept (We do not expect a biological signal to appear in such a small portion of the dataset).
  • Figure 3: Concept interpretation results. (A) Interpretability of concepts compared to neurons. (a) Interpretations of neurons and concepts by a domain expert; (b) Interpretation of neurons and concepts with attribution-based GSEA. Strong annotation corresponds to enriched pathways with p-value $\leq$ 5e-5 (p-value $\leq$ 5e-3 for weak annotations). (B) Examples of interpreted concepts.
  • Figure 4: Stability of SAEs trained on different datasets. (A) Reconstruction error of cell embeddings from an external dataset compared to training samples. (B) Cosine similarity of matched concept vectors from SAE trained on the Tabula Sapiens Immune and SAE trained on the Cross-tissue Immune Cell Atlas, after finding the best alignment via the Hungarian algorithm as proposed in fel2025archetypal. (C) Examples of matching concepts with their most important genes. For each pair, the concept on the left is from the SAE trained on Tabula Sapiens Immune, and the concept on the right is from the SAE trained on the Cross-Tissue Immune Cell Atlas.
  • Figure 5: Interpretation of cell cycle phase classification. (A) Key concepts contributing to predictions based on concepts. (B) Key neurons contributing to predictions based on neurons.
  • ...and 5 more figures