Table of Contents
Fetching ...

A Hierarchical Sheaf Spectral Embedding Framework for Single-Cell RNA-seq Analysis

Xiang Xiang Wang, Guo-Wei We

Abstract

Single-cell RNA-seq data analysis typically requires representations that capture heterogeneous local structure across multiple scales while remaining stable and interpretable. In this work, we propose a hierarchical sheaf spectral embedding (HSSE) framework that constructs informative cell-level features based on persistent sheaf Laplacian analysis. Starting from scale-dependent low-dimensional embeddings, we define cell-centered local neighborhoods at multiple resolutions. For each local neighborhood, we construct a data-driven cellular sheaf that encodes local relationships among cells. We then compute persistent sheaf Laplacians over sampled filtration intervals and extract spectral statistics that summarize the evolution of local relational structure across scales. These spectral descriptors are aggregated into a unified feature vector for each cell and can be directly used in downstream learning tasks without additional model training. We evaluate HSSE on twelve benchmark single-cell RNA-seq datasets covering diverse biological systems and data scales. Under a consistent classification protocol, HSSE achieves competitive or improved performance compared with existing multiscale and classical embedding-based methods across multiple evaluation metrics. The results demonstrate that sheaf spectral representations provide a robust and interpretable approach for single-cell RNA-seq data representation learning.

A Hierarchical Sheaf Spectral Embedding Framework for Single-Cell RNA-seq Analysis

Abstract

Single-cell RNA-seq data analysis typically requires representations that capture heterogeneous local structure across multiple scales while remaining stable and interpretable. In this work, we propose a hierarchical sheaf spectral embedding (HSSE) framework that constructs informative cell-level features based on persistent sheaf Laplacian analysis. Starting from scale-dependent low-dimensional embeddings, we define cell-centered local neighborhoods at multiple resolutions. For each local neighborhood, we construct a data-driven cellular sheaf that encodes local relationships among cells. We then compute persistent sheaf Laplacians over sampled filtration intervals and extract spectral statistics that summarize the evolution of local relational structure across scales. These spectral descriptors are aggregated into a unified feature vector for each cell and can be directly used in downstream learning tasks without additional model training. We evaluate HSSE on twelve benchmark single-cell RNA-seq datasets covering diverse biological systems and data scales. Under a consistent classification protocol, HSSE achieves competitive or improved performance compared with existing multiscale and classical embedding-based methods across multiple evaluation metrics. The results demonstrate that sheaf spectral representations provide a robust and interpretable approach for single-cell RNA-seq data representation learning.

Paper Structure

This paper contains 24 sections, 2 theorems, 48 equations, 5 figures, 5 tables, 1 algorithm.

Key Result

Proposition 3.1

The restriction maps defined above satisfy the axioms of a cellular sheaf. In particular, for all simplices $\rho \le \sigma \le \tau$ in $K_i^{(s,k)}$,

Figures (5)

  • Figure 1: Overview of the proposed cell-centered persistent sheaf Laplacian framework for single-cell data analysis. Starting from the single-cell gene expression matrix, a family of scale-dependent low-dimensional representations is constructed to capture cellular structures at different resolutions. Based on these representations, scale-dependent distance matrices are computed to quantify pairwise relationships between cells. For each cell $i$, a cell-centered neighborhood is identified from the distance matrices, and neighborhood-induced simplicial complexes are constructed to encode local topological structures around the cell. On these cell-centered simplicial complexes, persistent sheaf Laplacians are computed via filtration sampling and spectral computation, yielding spectral information that characterizes the underlying sheaf-enhanced topological structures. These spectral features are aggregated to form a feature vector $Z_i$ associated with cell $i$. Collecting the feature vectors for all cells gives $\{Z_i\}_{i=1}^m$, where $m$ denotes the total number of cells, which serves as input for downstream learning tasks, e.g., classification.
  • Figure 2: Dataset-wise performance differences between HSSE and MDG across three evaluation metrics. Each panel shows paired results for HSSE and MDG on the same dataset using a dumbbell plot: (left) Macro-F1, (middle) Macro-Recall, and (right) Macro-AUC (OVR). Horizontal line segments connect MDG (circle) and HSSE (square), and the percentage annotation indicates the relative change of HSSE compared to MDG. Positive values denote improvements achieved by HSSE, while negative values indicate datasets where MDG performs slightly better.
  • Figure 3: Performance gains of HSSE over classical dimensionality reduction methods (PCA, UMAP, NMF, and t-SNE) across twelve benchmark single-cell RNA-seq datasets. Each panel reports the gain of HSSE in terms of (left) Macro-F1, (middle) Macro-Recall, and (right) Macro-AUC, respectively. Positive values indicate consistent performance improvements achieved by HSSE over the corresponding baseline method.
  • Figure 4: Sensitivity analysis with respect to the number of neighborhood sizes under a fixed scale configuration $S = \{5, 14, 25, 37, 50\}$. Results are shown for four representative datasets: GSE67835, GSE82187, GSE84133mouse1, and GSE94820. The horizontal axis denotes the number of neighborhood sizes used ($|\mathcal{K}|=1$ to $10$). Six evaluation metrics are reported: Accuracy, Macro F1, Weighted F1, Macro Recall, MCC, and Macro AUC.
  • Figure 5: Effect of the scale configuration size ($|S|$) on classification performance under a fixed neighborhood-size set $\mathcal{K} = \{5, 10, 15, 20, 30, 40, 50, 60, 70, 80\}$. Macro F1 and Accuracy are reported as $|S|$ increases from 1 to 5 on four representative datasets: GSE67835, GSE82187, GSE84133mouse1, and GSE94820.

Theorems & Definitions (11)

  • Definition 2.1: Simplicial Complex
  • Definition 2.2: Cellular Sheaf
  • Definition 3.1: Cell-Centered Cellular Sheaf
  • Definition 3.2: Restriction Maps
  • Proposition 3.1: Sheaf consistency
  • proof
  • Remark 3.1: Interpretability
  • Proposition 3.2: Scaling invariance of PSL eigenvectors
  • proof
  • Remark 3.2: Implications for the proposed framework
  • ...and 1 more