Table of Contents
Fetching ...

Coresets for Constrained Clustering: General Assignment Constraints and Improved Size Bounds

Lingxiao Huang, Jian Li, Pinyan Lu, Xuan Wu

TL;DR

A general class of assignment constraints, including capacity constraints on cluster centers, and assignment structure constraints for data points (modeled by a convex body $\mathcal{B}$) are introduced.

Abstract

Designing small-sized \emph{coresets}, which approximately preserve the costs of the solutions for large datasets, has been an important research direction for the past decade. We consider coreset construction for a variety of general constrained clustering problems. We introduce a general class of assignment constraints, including capacity constraints on cluster centers, and assignment structure constraints for data points (modeled by a convex body $\mathcal{B}$). We give coresets for clustering problems with such general assignment constraints that significantly generalize and improve known results. Notable implications include the first $\varepsilon$-coreset for capacitated and fair $k$-Median with $m$ outliers in Euclidean spaces whose size is $\tilde{O}(m + k^2 \varepsilon^{-4})$, generalizing and improving upon the prior bounds in [Braverman et al., FOCS' 22; Huang et al., ICLR' 23] (for capacitated $k$-Median, the coreset size bound obtained in [Braverman et al., FOCS' 22] is $\tilde{O}(k^3 \varepsilon^{-6})$, and for $k$-Median with $m$ outliers, the coreset size bound obtained in [Huang et al., ICLR' 23]} is $\tilde{O}(m + k^3 \varepsilon^{-5})$), and the first $ε$-coreset of size $\mathrm{poly}(k \varepsilon^{-1})$ for fault-tolerant clustering for various types of metric spaces.

Coresets for Constrained Clustering: General Assignment Constraints and Improved Size Bounds

TL;DR

A general class of assignment constraints, including capacity constraints on cluster centers, and assignment structure constraints for data points (modeled by a convex body ) are introduced.

Abstract

Designing small-sized \emph{coresets}, which approximately preserve the costs of the solutions for large datasets, has been an important research direction for the past decade. We consider coreset construction for a variety of general constrained clustering problems. We introduce a general class of assignment constraints, including capacity constraints on cluster centers, and assignment structure constraints for data points (modeled by a convex body ). We give coresets for clustering problems with such general assignment constraints that significantly generalize and improve known results. Notable implications include the first -coreset for capacitated and fair -Median with outliers in Euclidean spaces whose size is , generalizing and improving upon the prior bounds in [Braverman et al., FOCS' 22; Huang et al., ICLR' 23] (for capacitated -Median, the coreset size bound obtained in [Braverman et al., FOCS' 22] is , and for -Median with outliers, the coreset size bound obtained in [Huang et al., ICLR' 23]} is ), and the first -coreset of size for fault-tolerant clustering for various types of metric spaces.
Paper Structure (56 sections, 35 theorems, 191 equations, 1 table, 1 algorithm)

This paper contains 56 sections, 35 theorems, 191 equations, 1 table, 1 algorithm.

Key Result

Theorem 1.1

We consider $(k, z)$-Clustering with capacity upper/lower bound constraint for each center, assignment structure constraint for each point (specified by convex body $\mathcal{B} \subseteq \Delta_k$), and a total capacity constraint $\|\sigma\|_1=n-m$ (i.e., $m$ outliers). For any $0 < \varepsilon <

Theorems & Definitions (108)

  • Theorem 1.1: Informal; see Theorem \ref{['thm:coreset']}
  • Definition 2.1: Capacity constraint
  • Definition 2.2: Total capacity constraint
  • Definition 2.3: Assignment structure constraint
  • Definition 2.4: Coreset
  • Claim 2.5: Capacitated Clustering
  • proof
  • Claim 2.6: Fair Clustering
  • proof
  • Definition 3.1: $(\alpha,\beta,\gamma)$-Approximation for $(k, z)$-Clustering with $m$ outliers
  • ...and 98 more