Table of Contents
Fetching ...

The Computational Complexity of Almost Stable Clustering with Penalties

Kamyar Khodamoradi, Farnam Mansouri, Sandra Zilles

TL;DR

The paper advances the complexity landscape of stable clustering by analyzing generalized stability notions for $k$-Means and $k$-Median, including penalized variants, in metrics with bounded doubling dimension. It proves polynomial-time solvability for $(1+\varepsilon')$-stable instances on doubling metrics (with and without penalties) using an enhanced $\rho$-swap local search and penalty augmentation, while simultaneously establishing ETH-based super-polynomial lower bounds for almost-stable $(\alpha,\beta)$-stable instances in Euclidean and doubling spaces. The hardness results rely on reductions from Grid Tiling Inequality and Partial Vertex Cover via moment-curve constructions, illustrating a clear separation between exact solvability under strong stability and near-stable regimes. These results refine the understanding of when structure in input data enables efficient clustering and highlight the limitations of stability-based approaches in broader regimes, culminating in an open question about the existence of $1+\varepsilon$-approximations for almost-stable instances.

Abstract

We investigate the complexity of stable (or perturbation-resilient) instances of $\mathrm{k-M\small{EANS}}$ and $\mathrm{k-M\small{EDIAN}}$ clustering problems in metrics with small doubling dimension. While these problems have been extensively studied under multiplicative perturbation resilience in low-dimensional Euclidean spaces (e.g., (Friggstad et al., 2019; Cohen-Addad and Schwiegelshohn, 2017)), we adopt a more general notion of stability, termed ``almost stable'', which is closer to the notion of $(α, \varepsilon)$-perturbation resilience introduced by Balcan and Liang (2016). Additionally, we extend our results to $\mathrm{k-M\small{EANS}}$/$\mathrm{k-M\small{EDIAN}}$ with penalties, where each data point is either assigned to a cluster centre or incurs a penalty. We show that certain special cases of almost stable $\mathrm{k-M\small{EANS}}$/$\mathrm{k-M\small{EDIAN}}$ (with penalties) are solvable in polynomial time. To complement this, we also examine the hardness of almost stable instances and $(1 + \frac{1}{poly(n)})$-stable instances of $\mathrm{k-M\small{EANS}}$/$\mathrm{k-M\small{EDIAN}}$ (with penalties), proving super-polynomial lower bounds on the runtime of any exact algorithm under the widely believed Exponential Time Hypothesis (ETH).

The Computational Complexity of Almost Stable Clustering with Penalties

TL;DR

The paper advances the complexity landscape of stable clustering by analyzing generalized stability notions for -Means and -Median, including penalized variants, in metrics with bounded doubling dimension. It proves polynomial-time solvability for -stable instances on doubling metrics (with and without penalties) using an enhanced -swap local search and penalty augmentation, while simultaneously establishing ETH-based super-polynomial lower bounds for almost-stable -stable instances in Euclidean and doubling spaces. The hardness results rely on reductions from Grid Tiling Inequality and Partial Vertex Cover via moment-curve constructions, illustrating a clear separation between exact solvability under strong stability and near-stable regimes. These results refine the understanding of when structure in input data enables efficient clustering and highlight the limitations of stability-based approaches in broader regimes, culminating in an open question about the existence of -approximations for almost-stable instances.

Abstract

We investigate the complexity of stable (or perturbation-resilient) instances of and clustering problems in metrics with small doubling dimension. While these problems have been extensively studied under multiplicative perturbation resilience in low-dimensional Euclidean spaces (e.g., (Friggstad et al., 2019; Cohen-Addad and Schwiegelshohn, 2017)), we adopt a more general notion of stability, termed ``almost stable'', which is closer to the notion of -perturbation resilience introduced by Balcan and Liang (2016). Additionally, we extend our results to / with penalties, where each data point is either assigned to a cluster centre or incurs a penalty. We show that certain special cases of almost stable / (with penalties) are solvable in polynomial time. To complement this, we also examine the hardness of almost stable instances and -stable instances of / (with penalties), proving super-polynomial lower bounds on the runtime of any exact algorithm under the widely believed Exponential Time Hypothesis (ETH).

Paper Structure

This paper contains 27 sections, 27 theorems, 30 equations, 1 figure, 1 algorithm.

Key Result

Theorem 1

Fix any $\varepsilon' > 0$. $(1 + \varepsilon')$-stable instances of $k$-Means in doubling metrics can be solved in polynomial time.

Figures (1)

  • Figure 1: In $\mathbb{R}\xspace^4$ (left), the $3$-sphere that goes through the points $p_1, p_2, p_3$ on the moment curve and is tangent to the moment curve on $p_2$ and $p_3$, has no other intersections with the moment curve after the origin. In $\mathbb{R}\xspace^3$ (right), the sphere that is tangent to the moment curve on $p_1$ and $p_2$ has no other intersections with the moment curve.

Theorems & Definitions (32)

  • Theorem 1: Exact Solution for Stable
  • Theorem 2: Exact Solution for Stable Penalty
  • Theorem 3: Hardness of Almost Stable Penalty
  • Theorem 4: Hardness of Almost Stable
  • Theorem 5: Hardness of $(1 + \frac{1}{poly(n)})$-Stable Penalty
  • Theorem 6: Hardness of $(1 + \frac{1}{poly(n)})$-Stable
  • Definition 7
  • Theorem 8: Theorem 5 in friggstad2019exact
  • Theorem 9: Theorem 6 in friggstad2019exact)
  • Lemma 10
  • ...and 22 more