Principal Curvatures Estimation with Applications to Single Cell Data
Yanlei Zhang, Lydia Mezrag, Xingzhi Sun, Charles Xu, Kincaid Macdonald, Dhananjay Bhaskar, Smita Krishnaswamy, Guy Wolf, Bastian Rieck
TL;DR
The paper tackles estimating intrinsic curvature on data manifolds derived from large point clouds, notably in single-cell transcriptomics where data density is highly variable. It introduces Adaptive Local PCA (AdaL-PCA), which fuses Local PCA for tangent-space estimation with data-driven neighborhood scales chosen from the explained variance ratio $ρ(r) = \frac{\sum_{i=1}^2 σ_i(r)^2}{\sum_{i=1}^3 σ_i(r)^2}$ using a threshold $γ$, and a curvature-scale $τ$ chosen by $τ = \arg\min_r ρ(r)$. Key contributions include adaptive parameter selection that removes manual tuning, accurate recovery of Gaussian and mean curvature on canonical manifolds, and biologically meaningful applications to single-cell data where curvature patterns reveal differentiation trajectories; the method integrates with PHATE for visualization to interpret differentiation dynamics. Overall, AdaL-PCA provides robust, geometrically informed insights into high-dimensional biological datasets, enabling directionality and branching analyses in cellular differentiation.
Abstract
The rapidly growing field of single-cell transcriptomic sequencing (scRNAseq) presents challenges for data analysis due to its massive datasets. A common method in manifold learning consists in hypothesizing that datasets lie on a lower dimensional manifold. This allows to study the geometry of point clouds by extracting meaningful descriptors like curvature. In this work, we will present Adaptive Local PCA (AdaL-PCA), a data-driven method for accurately estimating various notions of intrinsic curvature on data manifolds, in particular principal curvatures for surfaces. The model relies on local PCA to estimate the tangent spaces. The evaluation of AdaL-PCA on sampled surfaces shows state-of-the-art results. Combined with a PHATE embedding, the model applied to single-cell RNA sequencing data allows us to identify key variations in the cellular differentiation.
