Table of Contents
Fetching ...

Faster and Simpler Greedy Algorithm for $k$-Median and $k$-Means

Max Dupré la Tour, David Saulpic

TL;DR

This paper tackles fast approximations for $k$-means, $k$-median, and more generally $(k,z)$-clustering by refining a recursive greedy framework originally due to Mettu and Plaxton. It introduces a simplification that replaces the original ball-value with $Value(B(x,r)) oughly r^z \,|B(x,r)|$, and supports approximate ball neighborhoods via $N(x,r)$, enabling near-linear or almost-linear time implementations in Euclidean spaces and sparse graphs. The main contributions are a poly$(c)$-approximation with explicit running-time bounds, plus practical Euclidean and graph-specific instantiations: near-linear time via quadtrees, constant-factor via LSH, and near-linear ball-counting in graphs using probabilistic partitions and Cohen-style sketches. These results yield scalable, incremental/online seeding procedures that maintain strong guarantees for $(k,z)$-clustering, including an online variant where prefixes provide good approximations. The work thus advances practical greedy approaches for core clustering objectives while clarifying their algorithmic structure and runtime trade-offs.

Abstract

Clustering problems such as $k$-means and $k$-median are staples of unsupervised learning, and many algorithmic techniques have been developed to tackle their numerous aspects. In this paper, we focus on the class of greedy approximation algorithm, that attracted less attention than local-search or primal-dual counterparts. In particular, we study the recursive greedy algorithm developed by Mettu and Plaxton [SIAM J. Comp 2003]. We provide a simplification of the algorithm, allowing for faster implementation, in graph metrics or in Euclidean space, where our algorithm matches or improves the state-of-the-art.

Faster and Simpler Greedy Algorithm for $k$-Median and $k$-Means

TL;DR

This paper tackles fast approximations for -means, -median, and more generally -clustering by refining a recursive greedy framework originally due to Mettu and Plaxton. It introduces a simplification that replaces the original ball-value with , and supports approximate ball neighborhoods via , enabling near-linear or almost-linear time implementations in Euclidean spaces and sparse graphs. The main contributions are a poly-approximation with explicit running-time bounds, plus practical Euclidean and graph-specific instantiations: near-linear time via quadtrees, constant-factor via LSH, and near-linear ball-counting in graphs using probabilistic partitions and Cohen-style sketches. These results yield scalable, incremental/online seeding procedures that maintain strong guarantees for -clustering, including an online variant where prefixes provide good approximations. The work thus advances practical greedy approaches for core clustering objectives while clarifying their algorithmic structure and runtime trade-offs.

Abstract

Clustering problems such as -means and -median are staples of unsupervised learning, and many algorithmic techniques have been developed to tackle their numerous aspects. In this paper, we focus on the class of greedy approximation algorithm, that attracted less attention than local-search or primal-dual counterparts. In particular, we study the recursive greedy algorithm developed by Mettu and Plaxton [SIAM J. Comp 2003]. We provide a simplification of the algorithm, allowing for faster implementation, in graph metrics or in Euclidean space, where our algorithm matches or improves the state-of-the-art.
Paper Structure (24 sections, 23 theorems, 37 equations, 3 algorithms)

This paper contains 24 sections, 23 theorems, 37 equations, 3 algorithms.

Key Result

Theorem 1.1

Let $(P, \text{dist})$ be a metric space with aspect-ratio $\Delta$,The aspect-ratio is the ratio between the largest distance and the smallest non-zero distance in the metric. and $c > 5$ be a constant. Suppose there is: Then the recursive greedy algorithm can be implemented such that it is a $\mathop{\mathrm{poly}}\limits(c)$- approximation and has running time $T_{\mathop{\mathrm{Value}}\limit

Theorems & Definitions (40)

  • Theorem 1.1: see \ref{['thm:correctness']} and \ref{['thm:runningtime']}
  • Corollary 1.2
  • Lemma 2.1
  • proof
  • Theorem 3.1: MPOnlineMedian
  • Theorem 3.2
  • Theorem 3.3
  • proof
  • Lemma 3.5
  • proof
  • ...and 30 more