Fully Dynamic Euclidean k-Means
Sayan Bhattacharya, Martín Costa, Ermiya Farokhnejad, Shaofeng H. -C. Jiang, Yaonan Jin, Jianing Lou
TL;DR
The paper tackles fully dynamic Euclidean $k$-means, where points are inserted/deleted and a set of $k$ centers must be maintained with high-quality clustering. It advances the state of the art by combining a Euclidean-tailored adaptation of a near-optimal dynamic framework with novel geometric data structures and a consistent hashing scheme, achieving a $\mathrm{poly}(1/\epsilon)$-approximation, $\tilde{O}(k^{\epsilon})$ amortized update time, and $\tilde{O}(1)$ amortized recourse. The approach hinges on two core ideas: (i) robustifying the dynamic framework to work with approximate neighborhoods and (ii) designing fast, adversary-robust data structures for range queries and approximate assignments using efficient consistent hashing and sublinear per-point evaluations. The resulting algorithm is near-optimal in three metrics and notably specialized to Euclidean space, offering both theoretical insight and practical primitives (e.g., coresets for restricted $k$-means, D$^2$-sampling-based augmentation) that may be of independent interest for dynamic clustering and related geometric problems.
Abstract
We consider the fundamental Euclidean $k$-means clustering problem in a dynamic setting, where the input $X \subseteq \mathbb{R}^d$ evolves over time via a sequence of point insertions/deletions. We have to explicitly maintain a solution (a set of $k$ centers) $S \subseteq \mathbb{R}^d$ throughout these updates, while minimizing the approximation ratio, the update time (time taken to handle a point insertion/deletion) and the recourse (number of changes made to the solution $S$) of the algorithm. We present a dynamic algorithm for this problem with $\text{poly}(1/ε)$-approximation ratio, $\tilde{O}(k^ε)$ update time and $\tilde{O}(1)$ recourse. In the general regime, where the dimension $d$ cannot be assumed to be a fixed constant, our algorithm has almost optimal guarantees across all these three parameters. Indeed, improving our update time or approximation ratio would imply beating the state-of-the-art static algorithm for this problem (which is widely believed to be the best possible), and the recourse of any dynamic algorithm must be $Ω(1)$. We obtain our result by building on top of the recent work of [Bhattacharya, Costa, Farokhnejad; STOC'25], which gave a near-optimal dynamic algorithm for $k$-means in general metric spaces (as opposed to in the Euclidean setting). Along the way, we design several novel geometric data structures that are of independent interest. Specifically, one of our main contributions is designing the first consistent hashing scheme [Czumaj, Jiang, Krauthgamer, Veselý, Yang; FOCS'22] that achieves $\tilde O(n^ε)$ running time per point evaluation with competitive parameters.
