Table of Contents
Fetching ...

Scalable Second-order Riemannian Optimization for $K$-means Clustering

Peng Xu, Chun-Ying Hou, Xiaohui Chen, Richard Y. Zhang

TL;DR

A new formulation of the K-means clustering problem as a smooth unconstrained optimization over a submanifold and characterize its Riemannian structures to allow it to be solved using a second-order cubic-regularized Riemannian Newton algorithm is provided.

Abstract

Clustering is a hard discrete optimization problem. Nonconvex approaches such as low-rank semidefinite programming (SDP) have recently demonstrated promising statistical and local algorithmic guarantees for cluster recovery. Due to the combinatorial structure of the $K$-means clustering problem, current relaxation algorithms struggle to balance their constraint feasibility and objective optimality, presenting tremendous challenges in computing the second-order critical points with rigorous guarantees. In this paper, we provide a new formulation of the $K$-means problem as a smooth unconstrained optimization over a submanifold and characterize its Riemannian structures to allow it to be solved using a second-order cubic-regularized Riemannian Newton algorithm. By factorizing the $K$-means manifold into a product manifold, we show how each Newton subproblem can be solved in linear time. Our numerical experiments show that the proposed method converges significantly faster than the state-of-the-art first-order nonnegative low-rank factorization method, while achieving similarly optimal statistical accuracy.

Scalable Second-order Riemannian Optimization for $K$-means Clustering

TL;DR

A new formulation of the K-means clustering problem as a smooth unconstrained optimization over a submanifold and characterize its Riemannian structures to allow it to be solved using a second-order cubic-regularized Riemannian Newton algorithm is provided.

Abstract

Clustering is a hard discrete optimization problem. Nonconvex approaches such as low-rank semidefinite programming (SDP) have recently demonstrated promising statistical and local algorithmic guarantees for cluster recovery. Due to the combinatorial structure of the -means clustering problem, current relaxation algorithms struggle to balance their constraint feasibility and objective optimality, presenting tremendous challenges in computing the second-order critical points with rigorous guarantees. In this paper, we provide a new formulation of the -means problem as a smooth unconstrained optimization over a submanifold and characterize its Riemannian structures to allow it to be solved using a second-order cubic-regularized Riemannian Newton algorithm. By factorizing the -means manifold into a product manifold, we show how each Newton subproblem can be solved in linear time. Our numerical experiments show that the proposed method converges significantly faster than the state-of-the-art first-order nonnegative low-rank factorization method, while achieving similarly optimal statistical accuracy.

Paper Structure

This paper contains 42 sections, 8 theorems, 107 equations, 19 figures, 1 algorithm.

Key Result

Lemma 1

Let $Z=Z^\top\in\mathbb{R}^{n\times n}$ be the symmetric block-diagonal matrix defined by $Z_{ij}=1/\lvert G_{k}\rvert$ if $i,j\in G_{k}$, and $Z_{ij}=0$ otherwise. Then for any integer $r\in[K,n]$, there is a unique (up to column permutation) $U\in\mathbb{R}_{+}^{n\times K}$ such that $Z=U U^\top$.

Figures (19)

  • Figure 1: Local convergence to second-order critical points yields global optimality. In the GMM setting, where ground-truth partitions can be planted, we consistently observe local convergence to the global optimum, yielding zero clustering error. This provides strong numerical evidence that near-second-order critical points are near-globally optimal, as hypothesized in Assumption \ref{['asm:bengin']}.
  • Figure 2: Real-world benchmark on CyTOF data. We compared our method to NLR, the previous state-of-the-art, as well as classical benchmarks SC, NMF, and $K$M++. Our method and NLR achieve the most consistently accurate clustering, with the smallest variance and the fewest outliers (left), but we outperform NLR in ground truth recovery (right).
  • Figure 3: Comparison with previous state-of-the-art NLR on GMM. Our second-order method reaches optimality in 152 iterations, while NLR needs 80k. Even though each second-order iteration costs $\approx$ 25--100 NLR steps, the total runtime is still two to four times shorter. (Left and middle) clustering accuracy vs log iterations and linear time. (Right) per-iteration time vs sample size $n$.
  • Figure 4: Comparison with prior Riemannian $K$-means method of CarsonMixonVillarWard_manifold-Kmeans on real-world data. Each run is warm‑started from the previous and the penalty is stepped through $\lambda_i=0,10^4, 10^6,10^7$. However: (Left) average mis‑clustering exceeds 30%; (Middle) the recovery error $\lVert Z-Z^\star\rVert_F$ remains large; (Right) the infeasibility $\lVert U_-\rVert$ never vanishes. Our Riemannian method, shown for reference, enforces $U_-=0$ by design and achieves near-zero error in both metrics.
  • Figure 5: Comparison with classical Riemannian Trust Region (RTR) on GMM. Our method drives both loss and gradient norm to machine precision in around 360 iterations. In contrast, RTR stagnates for over 21k iterations due to the extreme ill-conditioning induced by the log penalty.
  • ...and 14 more figures

Theorems & Definitions (14)

  • Lemma 1
  • Theorem 1: Riemannian cubic-regularized Newton
  • Theorem 2
  • Lemma 2
  • Theorem 3: Average-case phase transition for exact recovery
  • proof : Proof of Lemma \ref{['lem:membership2clusterlabel']}
  • proof : Proof of Lemma \ref{['lem:p_lambda']}
  • proof : Proof of Lemma \ref{['lem:feasibility']}
  • Lemma 3
  • proof : Proof of Lemma \ref{['lem:LICQ_submanifold_Kmeans']}
  • ...and 4 more