Table of Contents
Fetching ...

On Optimal Coreset Construction for Euclidean $(k,z)$-Clustering

Lingxiao Huang, Jian Li, Xuan Wu

TL;DR

A new coreset lower bound Ω(k ε−z−2) for Euclidean (k,z)-clustering when ε ≥ Ω(k−1/(z+2)).

Abstract

Constructing small-sized coresets for various clustering problems in different metric spaces has attracted significant attention for the past decade. A central problem in the coreset literature is to understand what is the best possible coreset size for $(k,z)$-clustering in Euclidean space. While there has been significant progress in the problem, there is still a gap between the state-of-the-art upper and lower bounds. For instance, the best known upper bound for $k$-means ($z=2$) is $\min \{O(k^{3/2} \varepsilon^{-2}),O(k \varepsilon^{-4})\}$ [1,2], while the best known lower bound is $Ω(k\varepsilon^{-2})$ [1]. In this paper, we make significant progress on both upper and lower bounds. For a large range of parameters (i.e., $\varepsilon, k$), we have a complete understanding of the optimal coreset size. In particular, we obtain the following results: (1) We present a new coreset lower bound $Ω(k \varepsilon^{-z-2})$ for Euclidean $(k,z)$-clustering when $\varepsilon \geq Ω(k^{-1/(z+2)})$. In view of the prior upper bound $\tilde{O}_z(k \varepsilon^{-z-2})$ [1], the bound is optimal. The new lower bound also implies improved lower bounds for $(k,z)$-clustering in doubling metrics. (2) For the upper bound, we provide efficient coreset construction algorithms for $(k,z)$-clustering with improved or optimal coreset sizes in several metric spaces. In particular, we provide an $\tilde{O}_z(k^{\frac{2z+2}{z+2}} \varepsilon^{-2})$-sized coreset, with a unfied analysis, for $(k,z)$-clustering for all $z\geq 1$ in Euclidean space. [1] Cohen-Addad, Larsen, Saulpic, Schwiegelshohn. STOC'22. [2] Cohen-Addad, Larsen, Saulpic, Schwiegelshohn, Sheikh-Omar, NeurIPS'22.

On Optimal Coreset Construction for Euclidean $(k,z)$-Clustering

TL;DR

A new coreset lower bound Ω(k ε−z−2) for Euclidean (k,z)-clustering when ε ≥ Ω(k−1/(z+2)).

Abstract

Constructing small-sized coresets for various clustering problems in different metric spaces has attracted significant attention for the past decade. A central problem in the coreset literature is to understand what is the best possible coreset size for -clustering in Euclidean space. While there has been significant progress in the problem, there is still a gap between the state-of-the-art upper and lower bounds. For instance, the best known upper bound for -means () is [1,2], while the best known lower bound is [1]. In this paper, we make significant progress on both upper and lower bounds. For a large range of parameters (i.e., ), we have a complete understanding of the optimal coreset size. In particular, we obtain the following results: (1) We present a new coreset lower bound for Euclidean -clustering when . In view of the prior upper bound [1], the bound is optimal. The new lower bound also implies improved lower bounds for -clustering in doubling metrics. (2) For the upper bound, we provide efficient coreset construction algorithms for -clustering with improved or optimal coreset sizes in several metric spaces. In particular, we provide an -sized coreset, with a unfied analysis, for -clustering for all in Euclidean space. [1] Cohen-Addad, Larsen, Saulpic, Schwiegelshohn. STOC'22. [2] Cohen-Addad, Larsen, Saulpic, Schwiegelshohn, Sheikh-Omar, NeurIPS'22.
Paper Structure (51 sections, 23 theorems, 132 equations, 3 figures, 1 table, 1 algorithm)

This paper contains 51 sections, 23 theorems, 132 equations, 3 figures, 1 table, 1 algorithm.

Key Result

Theorem 1.2

For any integer $k\geq 1$, constant $z\geq 1$, and error parameter $\varepsilon \geq \Omega(k^{-1/(z+2)})$, there exists an instance $P$ in Euclidean metric $\mathbb{R}^d$ (for $d = O(k \varepsilon^z)$) such that any $\varepsilon$-coreset of $P$ is of size at least $\Omega(k \varepsilon^{-z-2})$. Mo

Figures (3)

  • Figure 1: Comparison of prior coreset bounds and the new ones for Euclidean $(k, z)$-Clustering. Note that this figure is based on the case that $z>2$ in which $k^{-1/z}> k^{-0.5}$.
  • Figure 2: An example of Definition \ref{['def:group']}
  • Figure 3: An example of Definition \ref{['def:main_partition']}

Theorems & Definitions (54)

  • Definition 1.1: Coreset langberg2010universalfeldman2011unified
  • Theorem 1.2: Coreset lower bound for Euclidean $(k, z)$-Clustering
  • Corollary 1.3: Coreset lower bound for $(k, z)$-Clustering in doubling metrics
  • Theorem 1.4: Improved upper bound for Euclidean $(k, z)$-Clustering; see also Theorem \ref{['thm:Euclidean']}
  • proof
  • Claim 2.1: Weights of $S_i$
  • proof
  • Claim 2.2: Properties of $C$
  • proof
  • proof
  • ...and 44 more