Multiscale Grassmann Manifolds for Single-Cell Data Analysis
Xiang Xiang Wang, Sean Cottrell, Guo-Wei Wei
TL;DR
The paper addresses the challenge that conventional Euclidean representations struggle to capture the intrinsic non-Euclidean geometry and multiscale structure of single-cell data. It introduces a multiscale Grassmann manifolds (MGM) framework that embeds cells as subspaces on the Grassmann manifold $Gr(n,p)$ by aggregating multiple scale embeddings, and uses Grassmann-distance measures to form a global affinity for clustering. A power-based scale sampling function selects scales to balance local and global information, enabling robust, multiscale representations. Experiments across nine public scRNA-seq datasets show that MGM yields stable embeddings and competitive or superior clustering performance, especially for small to medium-sized datasets, highlighting the value of integrating multiscale geometric information on non-Euclidean manifolds.
Abstract
Single-cell data analysis seeks to characterize cellular heterogeneity based on high-dimensional gene expression profiles. Conventional approaches represent each cell as a vector in Euclidean space, which limits their ability to capture intrinsic correlations and multiscale geometric structures. We propose a multiscale framework based on Grassmann manifolds that integrates machine learning with subspace geometry for single-cell data analysis. By generating embeddings under multiple representation scales, the framework combines their features from different geometric views into a unified Grassmann manifold. A power-based scale sampling function is introduced to control the selection of scales and balance in- formation across resolutions. Experiments on nine benchmark single-cell RNA-seq datasets demonstrate that the proposed approach effectively preserves meaningful structures and provides stable clustering performance, particularly for small to medium-sized datasets. These results suggest that Grassmann manifolds offer a coherent and informative foundation for analyzing single cell data.
