Table of Contents
Fetching ...

Principal Component Analysis in Space Forms

Puoya Tabaghi, Michael Khanzadeh, Yusu Wang, Sivash Mirarab

TL;DR

This paper advances PCA for non-Euclidean spaces by formulating Space Form PCA (SFPCA) that computes all principal geodesics as a closed-form solution via eigenequations in space forms with constant curvature. It develops dedicated, proper distortion costs for spherical and hyperbolic spaces, yielding consistent centroids and nested optimal subspaces, and avoiding costly iterative schemes. In spherical space, the optimal subspace aligns with leading eigenvectors of the second-moment matrix, while in hyperbolic space the solution relies on a Lorentzian ($J_D$) eigenstructure, with projections implemented through tangent-space parametrizations and isometries to lower-dimensional models. Extensive synthetic and real-data experiments show that SFPCA often achieves faster convergence and equal or better accuracy than existing Riemannian PCA approaches, and enables practical tasks such as outlier detection via hyperbolic spectrum analysis. Collectively, these results offer a principled, scalable framework for geometry-aware dimensionality reduction on data residing in spherical or hyperbolic manifolds, with clear pathways for applications in hierarchical, cyclical, and phylogenetic data analysis.

Abstract

Principal Component Analysis (PCA) is a workhorse of modern data science. While PCA assumes the data conforms to Euclidean geometry, for specific data types, such as hierarchical and cyclic data structures, other spaces are more appropriate. We study PCA in space forms; that is, those with constant curvatures. At a point on a Riemannian manifold, we can define a Riemannian affine subspace based on a set of tangent vectors. Finding the optimal low-dimensional affine subspace for given points in a space form amounts to dimensionality reduction. Our Space Form PCA (SFPCA) seeks the affine subspace that best represents a set of manifold-valued points with the minimum projection cost. We propose proper cost functions that enjoy two properties: (1) their optimal affine subspace is the solution to an eigenequation, and (2) optimal affine subspaces of different dimensions form a nested set. These properties provide advances over existing methods, which are mostly iterative algorithms with slow convergence and weaker theoretical guarantees. We evaluate the proposed SFPCA on real and simulated data in spherical and hyperbolic spaces. We show that it outperforms alternative methods in estimating true subspaces (in simulated data) with respect to convergence speed or accuracy, often both.

Principal Component Analysis in Space Forms

TL;DR

This paper advances PCA for non-Euclidean spaces by formulating Space Form PCA (SFPCA) that computes all principal geodesics as a closed-form solution via eigenequations in space forms with constant curvature. It develops dedicated, proper distortion costs for spherical and hyperbolic spaces, yielding consistent centroids and nested optimal subspaces, and avoiding costly iterative schemes. In spherical space, the optimal subspace aligns with leading eigenvectors of the second-moment matrix, while in hyperbolic space the solution relies on a Lorentzian () eigenstructure, with projections implemented through tangent-space parametrizations and isometries to lower-dimensional models. Extensive synthetic and real-data experiments show that SFPCA often achieves faster convergence and equal or better accuracy than existing Riemannian PCA approaches, and enables practical tasks such as outlier detection via hyperbolic spectrum analysis. Collectively, these results offer a principled, scalable framework for geometry-aware dimensionality reduction on data residing in spherical or hyperbolic manifolds, with clear pathways for applications in hierarchical, cyclical, and phylogenetic data analysis.

Abstract

Principal Component Analysis (PCA) is a workhorse of modern data science. While PCA assumes the data conforms to Euclidean geometry, for specific data types, such as hierarchical and cyclic data structures, other spaces are more appropriate. We study PCA in space forms; that is, those with constant curvatures. At a point on a Riemannian manifold, we can define a Riemannian affine subspace based on a set of tangent vectors. Finding the optimal low-dimensional affine subspace for given points in a space form amounts to dimensionality reduction. Our Space Form PCA (SFPCA) seeks the affine subspace that best represents a set of manifold-valued points with the minimum projection cost. We propose proper cost functions that enjoy two properties: (1) their optimal affine subspace is the solution to an eigenequation, and (2) optimal affine subspaces of different dimensions form a nested set. These properties provide advances over existing methods, which are mostly iterative algorithms with slow convergence and weaker theoretical guarantees. We evaluate the proposed SFPCA on real and simulated data in spherical and hyperbolic spaces. We show that it outperforms alternative methods in estimating true subspaces (in simulated data) with respect to convergence speed or accuracy, often both.
Paper Structure (55 sections, 17 theorems, 36 equations, 8 figures, 3 tables)

This paper contains 55 sections, 17 theorems, 36 equations, 8 figures, 3 tables.

Key Result

Proposition 1

For any $\mathbb{S}_H^D$ and $x \in \mathbb{S}^D$, we have where $\{h^{\prime}_{k^{\prime}}\}_{k^{\prime} \in [K^{\prime}]}$ are the complete orthogonal basis vectors of $H^{\perp}$. Both $\mathbb{S}_H^D$ and $\mathbb{S}^D$ have a fixed curvature $C > 0$.

Figures (8)

  • Figure 1: $(a, b)$ One- $(a)$ and two-dimensional $(b)$ affine subspaces in $\mathbb{R}^3$. We show subspaces ($H^{\perp}$) at point $p$ instead of the origin. We may define the same Riemannian affine subspace using other base points, e.g., $p^{\prime}$. $(c)$ Two-dimensional affine subspace in a hyperbolic space (Poincaré) where $h^{\prime} \in T_p\mathbb{I}^3 = \mathbb{R}^3$.
  • Figure 2: $(a)$ A set of data points in $\mathbb{S}^D$, where $D=2$. $(b)$ The best estimate for the base point $p$ and the tangent subspace $H = h_1 \in T_p \mathbb{S}^D$ --- the spherical affine subspace $\mathbb{S}^D_H = (p \oplus H) \cap \mathbb{S}^{D}$. $(c)$ The projection of points onto $\mathbb{S}^D_H$ ($H = h_1$). $(d)$ The low-dimensional features in $\mathbb{S}^K$, where $K = \mathrm{dim}( \mathbb{S}^D_H ) = 1$.
  • Figure 3: For each spherical experiment, on the y-axes, we report running time and normalized output error. A dot corresponds to a random trial, and connected circles show the median across all trials. Figures $(a, b, c)$ show the results for $\mathbb{S}(K_1), \mathbb{S}(D_1), \mathbb{S}(N_1)$, respectively. All axes are in logarithmic scale.
  • Figure 4: Spherical experiment $\mathbb{S}(D_2)$. The y-axes show running time and normalized output error. All axes are in logarithmic scale.
  • Figure 5: For each scaled-down hyperbolic experiment, on the y-axes, we report running time and normalized output error in logarithmic scale. A dot corresponds to a random trial, and circles show the median across all trials. Figures in rows $(a), (b)$, and $(c)$ are $\mathbb{H}(K_1), \mathbb{H}(D_1)$, and $\mathbb{H}(N_1)$.
  • ...and 3 more figures

Theorems & Definitions (48)

  • Definition 1
  • Definition 2
  • Example 1
  • Example 2
  • Definition 3
  • Definition 4
  • Remark 1
  • Definition 5
  • Definition 6
  • Claim 1
  • ...and 38 more