Table of Contents
Fetching ...

Principal Curvatures Estimation with Applications to Single Cell Data

Yanlei Zhang, Lydia Mezrag, Xingzhi Sun, Charles Xu, Kincaid Macdonald, Dhananjay Bhaskar, Smita Krishnaswamy, Guy Wolf, Bastian Rieck

TL;DR

The paper tackles estimating intrinsic curvature on data manifolds derived from large point clouds, notably in single-cell transcriptomics where data density is highly variable. It introduces Adaptive Local PCA (AdaL-PCA), which fuses Local PCA for tangent-space estimation with data-driven neighborhood scales chosen from the explained variance ratio $ρ(r) = \frac{\sum_{i=1}^2 σ_i(r)^2}{\sum_{i=1}^3 σ_i(r)^2}$ using a threshold $γ$, and a curvature-scale $τ$ chosen by $τ = \arg\min_r ρ(r)$. Key contributions include adaptive parameter selection that removes manual tuning, accurate recovery of Gaussian and mean curvature on canonical manifolds, and biologically meaningful applications to single-cell data where curvature patterns reveal differentiation trajectories; the method integrates with PHATE for visualization to interpret differentiation dynamics. Overall, AdaL-PCA provides robust, geometrically informed insights into high-dimensional biological datasets, enabling directionality and branching analyses in cellular differentiation.

Abstract

The rapidly growing field of single-cell transcriptomic sequencing (scRNAseq) presents challenges for data analysis due to its massive datasets. A common method in manifold learning consists in hypothesizing that datasets lie on a lower dimensional manifold. This allows to study the geometry of point clouds by extracting meaningful descriptors like curvature. In this work, we will present Adaptive Local PCA (AdaL-PCA), a data-driven method for accurately estimating various notions of intrinsic curvature on data manifolds, in particular principal curvatures for surfaces. The model relies on local PCA to estimate the tangent spaces. The evaluation of AdaL-PCA on sampled surfaces shows state-of-the-art results. Combined with a PHATE embedding, the model applied to single-cell RNA sequencing data allows us to identify key variations in the cellular differentiation.

Principal Curvatures Estimation with Applications to Single Cell Data

TL;DR

The paper tackles estimating intrinsic curvature on data manifolds derived from large point clouds, notably in single-cell transcriptomics where data density is highly variable. It introduces Adaptive Local PCA (AdaL-PCA), which fuses Local PCA for tangent-space estimation with data-driven neighborhood scales chosen from the explained variance ratio using a threshold , and a curvature-scale chosen by . Key contributions include adaptive parameter selection that removes manual tuning, accurate recovery of Gaussian and mean curvature on canonical manifolds, and biologically meaningful applications to single-cell data where curvature patterns reveal differentiation trajectories; the method integrates with PHATE for visualization to interpret differentiation dynamics. Overall, AdaL-PCA provides robust, geometrically informed insights into high-dimensional biological datasets, enabling directionality and branching analyses in cellular differentiation.

Abstract

The rapidly growing field of single-cell transcriptomic sequencing (scRNAseq) presents challenges for data analysis due to its massive datasets. A common method in manifold learning consists in hypothesizing that datasets lie on a lower dimensional manifold. This allows to study the geometry of point clouds by extracting meaningful descriptors like curvature. In this work, we will present Adaptive Local PCA (AdaL-PCA), a data-driven method for accurately estimating various notions of intrinsic curvature on data manifolds, in particular principal curvatures for surfaces. The model relies on local PCA to estimate the tangent spaces. The evaluation of AdaL-PCA on sampled surfaces shows state-of-the-art results. Combined with a PHATE embedding, the model applied to single-cell RNA sequencing data allows us to identify key variations in the cellular differentiation.

Paper Structure

This paper contains 9 sections, 5 equations, 5 figures, 2 tables, 2 algorithms.

Figures (5)

  • Figure 1: Comparison of the explained variance ratio of the top two singular values and accuracy (RMSE) of Gaussian curvature estimation w.r.t. increasing radii of $\epsilon$-neighborhood and $\tau$-neighborhood around $p$ on torus.
  • Figure 2: Directional curvatures in an $\epsilon$-PCA neighborhood of $p$.
  • Figure 3: Comparison of AdaL-PCA against ground truth for mean curvature on three toy datasets. Corr stands for Pearson correlation and RMSE stands for the root means squared error.
  • Figure 4: Gaussian curvature and principal directions of embryonic stem cell differentiation. (A) PHATE visualization of scRNA-seq data color-coded by time intervals. (B) PHATE plot colored by Gaussian curvature values. (C, D) Principal directions at different stages of development of cells.
  • Figure 5: Gaussian curvature and principal directions on IPSC dataset using AdaL-PCA.