Fast $k$-means clustering in Riemannian manifolds via Fréchet maps: Applications to large-dimensional SPD matrices
Ji Shi, Nicolas Charon, Andreas Mang, Demetrio Labate, Robert Azencott
TL;DR
The paper addresses clustering data on non-Euclidean manifolds by embedding points with a $p$-Fréchet map $F^p$ into Euclidean space to enable fast $k$-means. Focusing on SPD$(n)$, it develops theoretical results for $p=1,2$, analyzes local invertibility and image geometry, and proposes principled reference-point strategies. Empirically, FMC achieves up to two orders of magnitude speedups compared with intrinsic SPD clustering while maintaining high accuracy, and it competes favorably with log-Euclidean embeddings in challenging configurations. The work highlights the practical potential of Fréchet-map embeddings for scalable manifold clustering and suggests future directions for theoretical generalization and integration with latent-space learning approaches.
Abstract
We introduce a novel, efficient framework for clustering data on high-dimensional, non-Euclidean manifolds that overcomes the computational challenges associated with standard intrinsic methods. The key innovation is the use of the $p$-Fréchet map $F^p : \mathcal{M} \to \mathbb{R}^\ell$ -- defined on a generic metric space $\mathcal{M}$ -- which embeds the manifold data into a lower-dimensional Euclidean space $\mathbb{R}^\ell$ using a set of reference points $\{r_i\}_{i=1}^\ell$, $r_i \in \mathcal{M}$. Once embedded, we can efficiently and accurately apply standard Euclidean clustering techniques such as k-means. We rigorously analyze the mathematical properties of $F^p$ in the Euclidean space and the challenging manifold of $n \times n$ symmetric positive definite matrices $\mathit{SPD}(n)$. Extensive numerical experiments using synthetic and real $\mathit{SPD}(n)$ data demonstrate significant performance gains: our method reduces runtime by up to two orders of magnitude compared to intrinsic manifold-based approaches, all while maintaining high clustering accuracy, including scenarios where existing alternative methods struggle or fail.
