Near-Optimal Bounds for Parameterized Euclidean k-means

Vincent Cohen-Addad; Karthik C. S.; David Saulpic; Chris Schwiegelshohn

Near-Optimal Bounds for Parameterized Euclidean k-means

Vincent Cohen-Addad, Karthik C. S., David Saulpic, Chris Schwiegelshohn

Abstract

The $k$-means problem is a classic objective for modeling clustering in a metric space. Given a set of points in a metric space, the goal is to find $k$ representative points so as to minimize the sum of the squared distances from each point to its closest representative. In this work, we study the approximability of $k$-means in Euclidean spaces parameterized by the number of clusters, $k$. In seminal works, de la Vega, Karpinski, Kenyon, and Rabani [STOC'03] and Kumar, Sabharwal, and Sen [JACM'10] showed how to obtain a $(1+\varepsilon)$-approximation for high-dimensional Euclidean $k$-means in time $2^{(k/\varepsilon)^{O(1)}} \cdot dn^{O(1)}$. In this work, we introduce a new fine-grained hypothesis called Exponential Time for Expanders Hypothesis (XXH) which roughly asserts that there are no non-trivial exponential time approximation algorithms for the vertex cover problem on near perfect vertex expanders. Assuming XXH, we close the above long line of work on approximating Euclidean $k$-means by showing that there is no $2^{(k/\varepsilon)^{1-o(1)}} \cdot n^{O(1)}$ time algorithm achieving a $(1+\varepsilon)$-approximation for $k$-means in Euclidean space. This lower bound is tight as it matches the algorithm given by Feldman, Monemizadeh, and Sohler [SoCG'07] whose runtime is $2^{\tilde{O}(k/\varepsilon)} + O(ndk)$. Furthermore, assuming XXH, we show that the seminal $O(n^{kd+1})$ runtime exact algorithm of Inaba, Katoh, and Imai [SoCG'94] for $k$-means is optimal for small values of $k$.

Near-Optimal Bounds for Parameterized Euclidean k-means

Abstract

The

-means problem is a classic objective for modeling clustering in a metric space. Given a set of points in a metric space, the goal is to find

representative points so as to minimize the sum of the squared distances from each point to its closest representative. In this work, we study the approximability of

-means in Euclidean spaces parameterized by the number of clusters,

. In seminal works, de la Vega, Karpinski, Kenyon, and Rabani [STOC'03] and Kumar, Sabharwal, and Sen [JACM'10] showed how to obtain a

-approximation for high-dimensional Euclidean

-means in time

. In this work, we introduce a new fine-grained hypothesis called Exponential Time for Expanders Hypothesis (XXH) which roughly asserts that there are no non-trivial exponential time approximation algorithms for the vertex cover problem on near perfect vertex expanders. Assuming XXH, we close the above long line of work on approximating Euclidean

-means by showing that there is no

time algorithm achieving a

-approximation for

-means in Euclidean space. This lower bound is tight as it matches the algorithm given by Feldman, Monemizadeh, and Sohler [SoCG'07] whose runtime is

. Furthermore, assuming XXH, we show that the seminal

runtime exact algorithm of Inaba, Katoh, and Imai [SoCG'94] for

-means is optimal for small values of

Near-Optimal Bounds for Parameterized Euclidean k-means

Abstract

Near-Optimal Bounds for Parameterized Euclidean k-means

Abstract

Paper Structure

Table of Contents

Key Result

Theorems & Definitions (52)