Table of Contents
Fetching ...

Fast $k$-means clustering in Riemannian manifolds via Fréchet maps: Applications to large-dimensional SPD matrices

Ji Shi, Nicolas Charon, Andreas Mang, Demetrio Labate, Robert Azencott

TL;DR

The paper addresses clustering data on non-Euclidean manifolds by embedding points with a $p$-Fréchet map $F^p$ into Euclidean space to enable fast $k$-means. Focusing on SPD$(n)$, it develops theoretical results for $p=1,2$, analyzes local invertibility and image geometry, and proposes principled reference-point strategies. Empirically, FMC achieves up to two orders of magnitude speedups compared with intrinsic SPD clustering while maintaining high accuracy, and it competes favorably with log-Euclidean embeddings in challenging configurations. The work highlights the practical potential of Fréchet-map embeddings for scalable manifold clustering and suggests future directions for theoretical generalization and integration with latent-space learning approaches.

Abstract

We introduce a novel, efficient framework for clustering data on high-dimensional, non-Euclidean manifolds that overcomes the computational challenges associated with standard intrinsic methods. The key innovation is the use of the $p$-Fréchet map $F^p : \mathcal{M} \to \mathbb{R}^\ell$ -- defined on a generic metric space $\mathcal{M}$ -- which embeds the manifold data into a lower-dimensional Euclidean space $\mathbb{R}^\ell$ using a set of reference points $\{r_i\}_{i=1}^\ell$, $r_i \in \mathcal{M}$. Once embedded, we can efficiently and accurately apply standard Euclidean clustering techniques such as k-means. We rigorously analyze the mathematical properties of $F^p$ in the Euclidean space and the challenging manifold of $n \times n$ symmetric positive definite matrices $\mathit{SPD}(n)$. Extensive numerical experiments using synthetic and real $\mathit{SPD}(n)$ data demonstrate significant performance gains: our method reduces runtime by up to two orders of magnitude compared to intrinsic manifold-based approaches, all while maintaining high clustering accuracy, including scenarios where existing alternative methods struggle or fail.

Fast $k$-means clustering in Riemannian manifolds via Fréchet maps: Applications to large-dimensional SPD matrices

TL;DR

The paper addresses clustering data on non-Euclidean manifolds by embedding points with a -Fréchet map into Euclidean space to enable fast -means. Focusing on SPD, it develops theoretical results for , analyzes local invertibility and image geometry, and proposes principled reference-point strategies. Empirically, FMC achieves up to two orders of magnitude speedups compared with intrinsic SPD clustering while maintaining high accuracy, and it competes favorably with log-Euclidean embeddings in challenging configurations. The work highlights the practical potential of Fréchet-map embeddings for scalable manifold clustering and suggests future directions for theoretical generalization and integration with latent-space learning approaches.

Abstract

We introduce a novel, efficient framework for clustering data on high-dimensional, non-Euclidean manifolds that overcomes the computational challenges associated with standard intrinsic methods. The key innovation is the use of the -Fréchet map -- defined on a generic metric space -- which embeds the manifold data into a lower-dimensional Euclidean space using a set of reference points , . Once embedded, we can efficiently and accurately apply standard Euclidean clustering techniques such as k-means. We rigorously analyze the mathematical properties of in the Euclidean space and the challenging manifold of symmetric positive definite matrices . Extensive numerical experiments using synthetic and real data demonstrate significant performance gains: our method reduces runtime by up to two orders of magnitude compared to intrinsic manifold-based approaches, all while maintaining high clustering accuracy, including scenarios where existing alternative methods struggle or fail.

Paper Structure

This paper contains 35 sections, 12 theorems, 61 equations, 5 figures, 5 tables, 4 algorithms.

Key Result

Proposition 1

Let $F_r^p$, $p \ge 1$, be a $p$-Fréchet map on a metric space $(\mathcal{M},d_\mathcal{M})$ associated with a list of reference points $r = (r_1,\dots,r_\ell)$ in $\mathcal{M}$. For $p=1$, the Fréchet map $F_r^1$ is a globally Lipschitz map on $\mathcal{M}$. If for some $x_0 \in \mathcal{M}$ and $\

Figures (5)

  • Figure 1: Illustration of the Fréchet map in the Euclidean space $\mathcal{M}=\mathbb{R}^3$ with $\ell=3$ reference points. The left panel (a) shows the two symmetric points with the same given image $d=(d_1,d_2,d_3)$ by the Fréchet map. The right panel shows the images of two disjoint balls in the upper halfspace by the Fréchet maps $F_r^2$ and $F_r^1$.
  • Figure 2: Illustration of the Riemannian vs log-Euclidean metrics. We show the tangent space $T_{\text{Id}}\mathcal{M}$ and two points $P$ and $Q$ on the manifold $\mathcal{M}$. Riemannian distances are shown in red. Euclidean distances in $T_{\text{Id}}\mathcal{M}$ are shown in black. We project points $P$ and $Q$ to $T_{\text{Id}}\mathcal{M}$ using a logarithmic map.
  • Figure 3: FMC algorithm. The Fréchet map $F$ takes a finite set $\mathcal{D} \subset \mathit{SPD}(n)$ into $\mathbb{R}^\ell$. Next, the $k$-means algorithm in $\mathbb{R}^\ell$ is applied to partition the set $F(\mathcal{D})$ into $k$ clusters $\{H_1,\dots,H_k\}$. Finally, a simple re-labeling of the data is to identify the corresponding clusters $(\mathit{CL}_1,\ldots,\mathit{CL}_k)$ of $\mathcal{D}$ in $\mathit{SPD}(n)$.
  • Figure 4: Illustration of the proposed strategy to select reference points for the Fréchet map $F$. Left: Close case. We select the reference points $R_1$ and $R_2$ outside the segment that connects the Fréchet means $M_1$ and $M_2$ of the clusters $\mathit{CL}_1$ and $\mathit{CL}_2$. Right: Far case. We select the reference points $R_1$ and $R_2$ inside the segment that connects the centers $M_1$ and $M_2$
  • Figure 5: Representative datasets from the four textures (aluminum foil, cotton, linen, and wood) considered in this study. These images are taken from the KTH-TIPS2b dataset.

Theorems & Definitions (23)

  • Remark 1
  • Definition 1
  • Proposition 1
  • proof
  • Lemma 1
  • Theorem 1
  • Remark 2
  • Theorem 2
  • Theorem 3
  • Proposition 2
  • ...and 13 more