Faster and Simpler Greedy Algorithm for $k$-Median and $k$-Means
Max Dupré la Tour, David Saulpic
TL;DR
This paper tackles fast approximations for $k$-means, $k$-median, and more generally $(k,z)$-clustering by refining a recursive greedy framework originally due to Mettu and Plaxton. It introduces a simplification that replaces the original ball-value with $Value(B(x,r)) oughly r^z \,|B(x,r)|$, and supports approximate ball neighborhoods via $N(x,r)$, enabling near-linear or almost-linear time implementations in Euclidean spaces and sparse graphs. The main contributions are a poly$(c)$-approximation with explicit running-time bounds, plus practical Euclidean and graph-specific instantiations: near-linear time via quadtrees, constant-factor via LSH, and near-linear ball-counting in graphs using probabilistic partitions and Cohen-style sketches. These results yield scalable, incremental/online seeding procedures that maintain strong guarantees for $(k,z)$-clustering, including an online variant where prefixes provide good approximations. The work thus advances practical greedy approaches for core clustering objectives while clarifying their algorithmic structure and runtime trade-offs.
Abstract
Clustering problems such as $k$-means and $k$-median are staples of unsupervised learning, and many algorithmic techniques have been developed to tackle their numerous aspects. In this paper, we focus on the class of greedy approximation algorithm, that attracted less attention than local-search or primal-dual counterparts. In particular, we study the recursive greedy algorithm developed by Mettu and Plaxton [SIAM J. Comp 2003]. We provide a simplification of the algorithm, allowing for faster implementation, in graph metrics or in Euclidean space, where our algorithm matches or improves the state-of-the-art.
