Table of Contents
Fetching ...

Fully Dynamic Euclidean k-Means

Sayan Bhattacharya, Martín Costa, Ermiya Farokhnejad, Shaofeng H. -C. Jiang, Yaonan Jin, Jianing Lou

TL;DR

The paper tackles fully dynamic Euclidean $k$-means, where points are inserted/deleted and a set of $k$ centers must be maintained with high-quality clustering. It advances the state of the art by combining a Euclidean-tailored adaptation of a near-optimal dynamic framework with novel geometric data structures and a consistent hashing scheme, achieving a $\mathrm{poly}(1/\epsilon)$-approximation, $\tilde{O}(k^{\epsilon})$ amortized update time, and $\tilde{O}(1)$ amortized recourse. The approach hinges on two core ideas: (i) robustifying the dynamic framework to work with approximate neighborhoods and (ii) designing fast, adversary-robust data structures for range queries and approximate assignments using efficient consistent hashing and sublinear per-point evaluations. The resulting algorithm is near-optimal in three metrics and notably specialized to Euclidean space, offering both theoretical insight and practical primitives (e.g., coresets for restricted $k$-means, D$^2$-sampling-based augmentation) that may be of independent interest for dynamic clustering and related geometric problems.

Abstract

We consider the fundamental Euclidean $k$-means clustering problem in a dynamic setting, where the input $X \subseteq \mathbb{R}^d$ evolves over time via a sequence of point insertions/deletions. We have to explicitly maintain a solution (a set of $k$ centers) $S \subseteq \mathbb{R}^d$ throughout these updates, while minimizing the approximation ratio, the update time (time taken to handle a point insertion/deletion) and the recourse (number of changes made to the solution $S$) of the algorithm. We present a dynamic algorithm for this problem with $\text{poly}(1/ε)$-approximation ratio, $\tilde{O}(k^ε)$ update time and $\tilde{O}(1)$ recourse. In the general regime, where the dimension $d$ cannot be assumed to be a fixed constant, our algorithm has almost optimal guarantees across all these three parameters. Indeed, improving our update time or approximation ratio would imply beating the state-of-the-art static algorithm for this problem (which is widely believed to be the best possible), and the recourse of any dynamic algorithm must be $Ω(1)$. We obtain our result by building on top of the recent work of [Bhattacharya, Costa, Farokhnejad; STOC'25], which gave a near-optimal dynamic algorithm for $k$-means in general metric spaces (as opposed to in the Euclidean setting). Along the way, we design several novel geometric data structures that are of independent interest. Specifically, one of our main contributions is designing the first consistent hashing scheme [Czumaj, Jiang, Krauthgamer, Veselý, Yang; FOCS'22] that achieves $\tilde O(n^ε)$ running time per point evaluation with competitive parameters.

Fully Dynamic Euclidean k-Means

TL;DR

The paper tackles fully dynamic Euclidean -means, where points are inserted/deleted and a set of centers must be maintained with high-quality clustering. It advances the state of the art by combining a Euclidean-tailored adaptation of a near-optimal dynamic framework with novel geometric data structures and a consistent hashing scheme, achieving a -approximation, amortized update time, and amortized recourse. The approach hinges on two core ideas: (i) robustifying the dynamic framework to work with approximate neighborhoods and (ii) designing fast, adversary-robust data structures for range queries and approximate assignments using efficient consistent hashing and sublinear per-point evaluations. The resulting algorithm is near-optimal in three metrics and notably specialized to Euclidean space, offering both theoretical insight and practical primitives (e.g., coresets for restricted -means, D-sampling-based augmentation) that may be of independent interest for dynamic clustering and related geometric problems.

Abstract

We consider the fundamental Euclidean -means clustering problem in a dynamic setting, where the input evolves over time via a sequence of point insertions/deletions. We have to explicitly maintain a solution (a set of centers) throughout these updates, while minimizing the approximation ratio, the update time (time taken to handle a point insertion/deletion) and the recourse (number of changes made to the solution ) of the algorithm. We present a dynamic algorithm for this problem with -approximation ratio, update time and recourse. In the general regime, where the dimension cannot be assumed to be a fixed constant, our algorithm has almost optimal guarantees across all these three parameters. Indeed, improving our update time or approximation ratio would imply beating the state-of-the-art static algorithm for this problem (which is widely believed to be the best possible), and the recourse of any dynamic algorithm must be . We obtain our result by building on top of the recent work of [Bhattacharya, Costa, Farokhnejad; STOC'25], which gave a near-optimal dynamic algorithm for -means in general metric spaces (as opposed to in the Euclidean setting). Along the way, we design several novel geometric data structures that are of independent interest. Specifically, one of our main contributions is designing the first consistent hashing scheme [Czumaj, Jiang, Krauthgamer, Veselý, Yang; FOCS'22] that achieves running time per point evaluation with competitive parameters.

Paper Structure

This paper contains 126 sections, 57 theorems, 223 equations, 8 algorithms.

Key Result

Theorem 1.1

For every sufficiently small $\epsilon > 0$, there is a randomized dynamic algorithm for Euclidean $k$-means with $\text{poly}(1/\epsilon)$-approximation ratio, $\tilde{O}(k^\epsilon)$ amortized update time and $\tilde{O}(1)$ amortized recourse, where $\tilde{O}(\cdot)$ notation hides polylogarithmi

Theorems & Definitions (160)

  • Theorem 1.1
  • Definition 4.1: Robust Center, Definition 3.2 in the arxiv version of BCF24
  • Definition 4.2: Robust Solution, Definition 3.5 in the arxiv version of BCF24
  • Definition 5.1: Consistent Hashing
  • Definition 5.2: Efficient consistent hashing
  • Lemma 5.3
  • Lemma 5.4
  • Lemma 5.5: $1$-Means Estimation
  • Lemma 5.6: ANN Distance
  • Lemma 5.7
  • ...and 150 more