Fully Dynamic k-Means Coreset in Near-Optimal Update Time

Max Dupré la Tour; Monika Henzinger; David Saulpic

Fully Dynamic k-Means Coreset in Near-Optimal Update Time

Max Dupré la Tour, Monika Henzinger, David Saulpic

TL;DR

This work addresses maintaining high-quality clustering under fully dynamic updates by constructing and maintaining a compact $\varepsilon$-coreset that preserves clustering costs for all $k$-center solutions. The authors develop a dynamic coreset framework built on a refined merge-and-reduce strategy, achieving near-optimal amortized update times $\tilde{O}(k)$ in general metrics and $\tilde{O}(d)$ in $\mathbb{R}^d$, with flexible query times that scale as $\tilde{O}(k^2)$ (or $\tilde{O}(kd)$) depending on the approximation factor. A key contribution is a coreset algorithm that supports insertions and deletions with an amortized cost of $\frac{T(n,k,\varepsilon)}{k}$, where $T$ is the static clustering time, and its integration into the merge-and-reduce framework yields fast dynamic updates. The paper also establishes Euclidean-space implications and two conjectures suggesting potential near-optimality for both update and query times in that setting, highlighting practical significance for real-time clustering on evolving data streams.

Abstract

We study in this paper the problem of maintaining a solution to $k$-median and $k$-means clustering in a fully dynamic setting. To do so, we present an algorithm to efficiently maintain a coreset, a compressed version of the dataset, that allows easy computation of a clustering solution at query time. Our coreset algorithm has near-optimal update time of $\tilde O(k)$ in general metric spaces, which reduces to $\tilde O(d)$ in the Euclidean space $\mathbb{R}^d$. The query time is $O(k^2)$ in general metrics, and $O(kd)$ in $\mathbb{R}^d$. To maintain a constant-factor approximation for $k$-median and $k$-means clustering in Euclidean space, this directly leads to an algorithm update time $\tilde O(d)$, and query time $\tilde O(kd + k^2)$. To maintain a $O(polylog~k)$-approximation, the query time is reduced to $\tilde O(kd)$.

Fully Dynamic k-Means Coreset in Near-Optimal Update Time

TL;DR

This work addresses maintaining high-quality clustering under fully dynamic updates by constructing and maintaining a compact

-coreset that preserves clustering costs for all

-center solutions. The authors develop a dynamic coreset framework built on a refined merge-and-reduce strategy, achieving near-optimal amortized update times

in general metrics and

, with flexible query times that scale as

(or

) depending on the approximation factor. A key contribution is a coreset algorithm that supports insertions and deletions with an amortized cost of

, where

is the static clustering time, and its integration into the merge-and-reduce framework yields fast dynamic updates. The paper also establishes Euclidean-space implications and two conjectures suggesting potential near-optimality for both update and query times in that setting, highlighting practical significance for real-time clustering on evolving data streams.

Abstract

We study in this paper the problem of maintaining a solution to

-median and

-means clustering in a fully dynamic setting. To do so, we present an algorithm to efficiently maintain a coreset, a compressed version of the dataset, that allows easy computation of a clustering solution at query time. Our coreset algorithm has near-optimal update time of

in general metric spaces, which reduces to

in the Euclidean space

. The query time is

in general metrics, and

. To maintain a constant-factor approximation for

-median and

-means clustering in Euclidean space, this directly leads to an algorithm update time

, and query time

. To maintain a

-approximation, the query time is reduced to

Paper Structure (15 sections, 9 theorems, 7 equations, 1 figure)

This paper contains 15 sections, 9 theorems, 7 equations, 1 figure.

Introduction
Our result and techniques
Further related work
Definitions and notations.
Preliminary results
$O(T/k)$ update time via merge-and-reduce tree
Description of the merge-and-reduce algorithm
Our algorithm
An efficient dynamic coreset algorithm
The Algorithm
Running-time Analysis
Correctness analysis
A Note on Euclidean Spaces
Conclusion
Coreset via Uniform Sampling

Key Result

Theorem 1

There exists an algorithm for fully dynamic $k$-median (resp. $k$-means), that maintains an $\varepsilon$-coreset of size $\tilde{O}\left( k \varepsilon^{-2}\right)$ with amortized update and query time $O\left( \frac{T(k \mathop{\mathrm{polylog}}\limits(n), k)}{k}\right)$, where $T(k \mathop{\mathr

Figures (1)

Figure :

Theorems & Definitions (18)

Theorem 1
Corollary 2
Lemma 4
Lemma 5
Lemma 6
proof
proof : Proof of \ref{['thm:main']}
proof
Lemma 8
proof
...and 8 more

Fully Dynamic k-Means Coreset in Near-Optimal Update Time

TL;DR

Abstract

Fully Dynamic k-Means Coreset in Near-Optimal Update Time

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (1)

Theorems & Definitions (18)