Table of Contents
Fetching ...

Near-Optimal Bounds for Parameterized Euclidean k-means

Vincent Cohen-Addad, Karthik C. S., David Saulpic, Chris Schwiegelshohn

Abstract

The $k$-means problem is a classic objective for modeling clustering in a metric space. Given a set of points in a metric space, the goal is to find $k$ representative points so as to minimize the sum of the squared distances from each point to its closest representative. In this work, we study the approximability of $k$-means in Euclidean spaces parameterized by the number of clusters, $k$. In seminal works, de la Vega, Karpinski, Kenyon, and Rabani [STOC'03] and Kumar, Sabharwal, and Sen [JACM'10] showed how to obtain a $(1+\varepsilon)$-approximation for high-dimensional Euclidean $k$-means in time $2^{(k/\varepsilon)^{O(1)}} \cdot dn^{O(1)}$. In this work, we introduce a new fine-grained hypothesis called Exponential Time for Expanders Hypothesis (XXH) which roughly asserts that there are no non-trivial exponential time approximation algorithms for the vertex cover problem on near perfect vertex expanders. Assuming XXH, we close the above long line of work on approximating Euclidean $k$-means by showing that there is no $2^{(k/\varepsilon)^{1-o(1)}} \cdot n^{O(1)}$ time algorithm achieving a $(1+\varepsilon)$-approximation for $k$-means in Euclidean space. This lower bound is tight as it matches the algorithm given by Feldman, Monemizadeh, and Sohler [SoCG'07] whose runtime is $2^{\tilde{O}(k/\varepsilon)} + O(ndk)$. Furthermore, assuming XXH, we show that the seminal $O(n^{kd+1})$ runtime exact algorithm of Inaba, Katoh, and Imai [SoCG'94] for $k$-means is optimal for small values of $k$.

Near-Optimal Bounds for Parameterized Euclidean k-means

Abstract

The -means problem is a classic objective for modeling clustering in a metric space. Given a set of points in a metric space, the goal is to find representative points so as to minimize the sum of the squared distances from each point to its closest representative. In this work, we study the approximability of -means in Euclidean spaces parameterized by the number of clusters, . In seminal works, de la Vega, Karpinski, Kenyon, and Rabani [STOC'03] and Kumar, Sabharwal, and Sen [JACM'10] showed how to obtain a -approximation for high-dimensional Euclidean -means in time . In this work, we introduce a new fine-grained hypothesis called Exponential Time for Expanders Hypothesis (XXH) which roughly asserts that there are no non-trivial exponential time approximation algorithms for the vertex cover problem on near perfect vertex expanders. Assuming XXH, we close the above long line of work on approximating Euclidean -means by showing that there is no time algorithm achieving a -approximation for -means in Euclidean space. This lower bound is tight as it matches the algorithm given by Feldman, Monemizadeh, and Sohler [SoCG'07] whose runtime is . Furthermore, assuming XXH, we show that the seminal runtime exact algorithm of Inaba, Katoh, and Imai [SoCG'94] for -means is optimal for small values of .

Paper Structure

This paper contains 51 sections, 26 theorems, 87 equations.

Key Result

Theorem 1.2

Assuming XXH, for every $\beta>0$, there is no randomized algorithm running in time $2^{\left(k/\varepsilon\right)^{1-\beta}} \cdot \mathrm{ poly}(n,d)$ that can $(1+\varepsilon)$-approximate the Euclidean $k$-means problem whenever $k\gg 1/\varepsilon$.

Theorems & Definitions (52)

  • Theorem 1.2: Answer to \ref{['q:approx']}; informal statement
  • Corollary 1.3
  • Lemma 2.1: Informal version of Lemma \ref{['lem:costCluster']}
  • Lemma 2.2: Informal version of Lemma \ref{['lem:monochrome']}
  • Lemma 2.3: Informal version of Lemma \ref{['lem:avgStruct']}
  • Definition 3.1: Continuous $k$-means
  • Lemma 3.2
  • Theorem 3.3: Chernoff bounds
  • Definition 4.1: Small Set Vertex Expanders
  • Theorem 4.3: Theorem 1 from AKS11
  • ...and 42 more