Table of Contents
Fetching ...

Almost-Optimal Upper and Lower Bounds for Clustering in Low Dimensional Euclidean Spaces

Vincent Cohen-Addad, Karthik C. S., David Saulpic, Chris Schwiegelshohn

TL;DR

The Cohen-Addad, Feldmann, and Saulpic [JACM'21] showed how to obtain a $(1+\varepsilon)-factor approximation in low-dimensional Euclidean metric for both the k-median and k-means problems in near-linear time.

Abstract

The $k$-median and $k$-means clustering objectives are classic objectives for modeling clustering in a metric space. Given a set of points in a metric space, the goal of the $k$-median (resp. $k$-means) problem is to find $k$ representative points so as to minimize the sum of the distances (resp. sum of squared distances) from each point to its closest representative. Cohen-Addad, Feldmann, and Saulpic [JACM'21] showed how to obtain a $(1+\varepsilon)$-factor approximation in low-dimensional Euclidean metric for both the $k$-median and $k$-means problems in near-linear time $2^{(1/\varepsilon)^{O(d^2)}} n \cdot \text{polylog}(n)$ (where $d$ is the dimension and $n$ is the number of input points). We improve this running time to $2^{\tilde{O}(1/\varepsilon)^{d-1}} \cdot n \cdot \text{polylog}(n)$, and show an almost matching lower bound: under the Gap Exponential Time Hypothesis for 3-SAT, there is no $2^{{o}(1/\varepsilon^{d-1})} n^{O(1)}$ algorithm achieving a $(1+\varepsilon)$-approximation for $k$-means.

Almost-Optimal Upper and Lower Bounds for Clustering in Low Dimensional Euclidean Spaces

TL;DR

The Cohen-Addad, Feldmann, and Saulpic [JACM'21] showed how to obtain a $(1+\varepsilon)-factor approximation in low-dimensional Euclidean metric for both the k-median and k-means problems in near-linear time.

Abstract

The -median and -means clustering objectives are classic objectives for modeling clustering in a metric space. Given a set of points in a metric space, the goal of the -median (resp. -means) problem is to find representative points so as to minimize the sum of the distances (resp. sum of squared distances) from each point to its closest representative. Cohen-Addad, Feldmann, and Saulpic [JACM'21] showed how to obtain a -factor approximation in low-dimensional Euclidean metric for both the -median and -means problems in near-linear time (where is the dimension and is the number of input points). We improve this running time to , and show an almost matching lower bound: under the Gap Exponential Time Hypothesis for 3-SAT, there is no algorithm achieving a -approximation for -means.
Paper Structure (26 sections, 16 theorems, 25 equations, 1 figure)

This paper contains 26 sections, 16 theorems, 25 equations, 1 figure.

Key Result

Theorem 1.2

For every $\varepsilon > 0$ and dimension $d$, the $k$-median and $k$-means problems in $\mathbb R^d$ can both be approximated to a $(1+\varepsilon)$-factor in time $2^{\widetilde{O}\left( 1/\varepsilon^{d-1}\right)} n \cdot \mathrm{ polylog}(n)$.The $\widetilde{O}$ notation hides an exponential dep

Figures (1)

  • Figure 1: Illustration of one case of the distinction. $B(p, 3(\mathcal{A}_p + \text{OPT}_p))$ is badly cut by the thick orange-dashed line, so $p$ cannot be connected via portals to $\text{OPT}(p)$. However, $p$ and $\mathcal{A}(p)$ are not badly cut, so $p$ can be connected to $\text{OPT}(\mathcal{A}(p))$ instead, making a detour through the thin orange-dashed line. The cost of this reassignment is charged to $b_2(p)$.

Theorems & Definitions (33)

  • Theorem 1.2
  • Theorem 1.3
  • Remark 2.1
  • Lemma 2.2
  • Definition 2.3
  • Lemma 2.4
  • Definition 2.5
  • Lemma 3.1: See Step 1. and Lemma 4.1 in jacm
  • Definition 3.2
  • Theorem 3.3: Structure Theorem
  • ...and 23 more