Moderate Dimension Reduction for $k$-Center Clustering

Shaofeng H. -C. Jiang; Robert Krauthgamer; Shay Sapir

Moderate Dimension Reduction for $k$-Center Clustering

Shaofeng H. -C. Jiang, Robert Krauthgamer, Shay Sapir

TL;DR

This work establishes the viability of this approach and shows that the famous $k-center problem is $\alpha$-approximated when reducing to dimension $O(\tfrac{\log n}{\alpha^2}+\log k)$, and is the first algorithm to beat $O(n)$ space in high dimension $d$, as all previous algorithms require space at least $\exp(d)$.

Abstract

The Johnson-Lindenstrauss (JL) Lemma introduced the concept of dimension reduction via a random linear map, which has become a fundamental technique in many computational settings. For a set of $n$ points in $\mathbb{R}^d$ and any fixed $ε>0$, it reduces the dimension $d$ to $O(\log n)$ while preserving, with high probability, all the pairwise Euclidean distances within factor $1+ε$. Perhaps surprisingly, the target dimension can be lower if one only wishes to preserve the optimal value of a certain problem on the pointset, e.g., Euclidean max-cut or $k$-means. However, for some notorious problems, like diameter (aka furthest pair), dimension reduction via the JL map to below $O(\log n)$ does not preserve the optimal value within factor $1+ε$. We propose to focus on another regime, of \emph{moderate dimension reduction}, where a problem's value is preserved within factor $α>1$ using target dimension $\tfrac{\log n}{poly(α)}$. We establish the viability of this approach and show that the famous $k$-center problem is $α$-approximated when reducing to dimension $O(\tfrac{\log n}{α^2}+\log k)$. Along the way, we address the diameter problem via the special case $k=1$. Our result extends to several important variants of $k$-center (with outliers, capacities, or fairness constraints), and the bound improves further with the input's doubling dimension. While our $poly(α)$-factor improvement in the dimension may seem small, it actually has significant implications for streaming algorithms, and easily yields an algorithm for $k$-center in dynamic geometric streams, that achieves $O(α)$-approximation using space $poly(kdn^{1/α^2})$. This is the first algorithm to beat $O(n)$ space in high dimension $d$, as all previous algorithms require space at least $\exp(d)$. Furthermore, it extends to the $k$-center variants mentioned above.

Moderate Dimension Reduction for $k$-Center Clustering

TL;DR

This work establishes the viability of this approach and shows that the famous

\alpha

O(\tfrac{\log n}{\alpha^2}+\log k)

O(n)

\exp(d)$.

Abstract

The Johnson-Lindenstrauss (JL) Lemma introduced the concept of dimension reduction via a random linear map, which has become a fundamental technique in many computational settings. For a set of

points in

and any fixed

, it reduces the dimension

while preserving, with high probability, all the pairwise Euclidean distances within factor

. Perhaps surprisingly, the target dimension can be lower if one only wishes to preserve the optimal value of a certain problem on the pointset, e.g., Euclidean max-cut or

-means. However, for some notorious problems, like diameter (aka furthest pair), dimension reduction via the JL map to below

does not preserve the optimal value within factor

. We propose to focus on another regime, of \emph{moderate dimension reduction}, where a problem's value is preserved within factor

using target dimension

. We establish the viability of this approach and show that the famous

-center problem is

-approximated when reducing to dimension

. Along the way, we address the diameter problem via the special case

. Our result extends to several important variants of

-center (with outliers, capacities, or fairness constraints), and the bound improves further with the input's doubling dimension. While our

-factor improvement in the dimension may seem small, it actually has significant implications for streaming algorithms, and easily yields an algorithm for

-center in dynamic geometric streams, that achieves

-approximation using space

. This is the first algorithm to beat

space in high dimension

, as all previous algorithms require space at least

. Furthermore, it extends to the

-center variants mentioned above.

Paper Structure (19 sections, 20 theorems, 36 equations, 2 tables)

This paper contains 19 sections, 20 theorems, 36 equations, 2 tables.

Introduction
Main Results
Application: Dynamic Geometric Streams
Extension: Inputs of Small Doubling Dimension
Technical Overview
Warm Up: the Furthest Point Query Problem.
Framework for Problems with Small Witness.
Related Work
Streaming Algorithms in High Dimension.
Dimension Reduction for Vanilla $k$-Center
Dimension Reduction for $k$-Center with Outliers
Dimension Reduction for $k$-Center with an Assignment Constraint
Streaming Algorithms for Capacitated and Fair $k$-Center
Dimension Reduction for $k$-Center in Doubling Sets
On the Optimallity of \ref{['thm:dim_reduction_informal']}
...and 4 more sections

Key Result

Theorem 1.1

For every $\alpha,d,k$ and $n$, there is a random linear map $G:\mathbb{R}^d\to\mathbb{R}^t$ with target dimension $t=O(\frac{\log n}{\alpha^2} + \log k)$, such that for every set $P\subset \mathbb{R}^d$ of $n$ points, with high probability, $G$ preserves the $k$-center value of $P$ within $O(\alpha

Theorems & Definitions (37)

Theorem 1.1: Main Result, informal
Remark 1.2
Theorem 1.3: Streaming Algorithm for $k$-Center, informal
Remark 1.4
Theorem 1.5: Dimension Reduction for Doubling Sets, informal
Lemma 1.6
Remark 1.7
Theorem 2.1
proof : Proof of \ref{['thm:dimension_reduction_for_kcenter']}.
Corollary 2.3: Streaming Vanilla $k$-Center
...and 27 more

Moderate Dimension Reduction for $k$-Center Clustering

TL;DR

Abstract

Moderate Dimension Reduction for $k$-Center Clustering

Authors

TL;DR

Abstract

Table of Contents

Key Result

Theorems & Definitions (37)