Principal Component Analysis in Space Forms
Puoya Tabaghi, Michael Khanzadeh, Yusu Wang, Sivash Mirarab
TL;DR
This paper advances PCA for non-Euclidean spaces by formulating Space Form PCA (SFPCA) that computes all principal geodesics as a closed-form solution via eigenequations in space forms with constant curvature. It develops dedicated, proper distortion costs for spherical and hyperbolic spaces, yielding consistent centroids and nested optimal subspaces, and avoiding costly iterative schemes. In spherical space, the optimal subspace aligns with leading eigenvectors of the second-moment matrix, while in hyperbolic space the solution relies on a Lorentzian ($J_D$) eigenstructure, with projections implemented through tangent-space parametrizations and isometries to lower-dimensional models. Extensive synthetic and real-data experiments show that SFPCA often achieves faster convergence and equal or better accuracy than existing Riemannian PCA approaches, and enables practical tasks such as outlier detection via hyperbolic spectrum analysis. Collectively, these results offer a principled, scalable framework for geometry-aware dimensionality reduction on data residing in spherical or hyperbolic manifolds, with clear pathways for applications in hierarchical, cyclical, and phylogenetic data analysis.
Abstract
Principal Component Analysis (PCA) is a workhorse of modern data science. While PCA assumes the data conforms to Euclidean geometry, for specific data types, such as hierarchical and cyclic data structures, other spaces are more appropriate. We study PCA in space forms; that is, those with constant curvatures. At a point on a Riemannian manifold, we can define a Riemannian affine subspace based on a set of tangent vectors. Finding the optimal low-dimensional affine subspace for given points in a space form amounts to dimensionality reduction. Our Space Form PCA (SFPCA) seeks the affine subspace that best represents a set of manifold-valued points with the minimum projection cost. We propose proper cost functions that enjoy two properties: (1) their optimal affine subspace is the solution to an eigenequation, and (2) optimal affine subspaces of different dimensions form a nested set. These properties provide advances over existing methods, which are mostly iterative algorithms with slow convergence and weaker theoretical guarantees. We evaluate the proposed SFPCA on real and simulated data in spherical and hyperbolic spaces. We show that it outperforms alternative methods in estimating true subspaces (in simulated data) with respect to convergence speed or accuracy, often both.
