Moderate Dimension Reduction for $k$-Center Clustering
Shaofeng H. -C. Jiang, Robert Krauthgamer, Shay Sapir
TL;DR
This work establishes the viability of this approach and shows that the famous $k-center problem is $\alpha$-approximated when reducing to dimension $O(\tfrac{\log n}{\alpha^2}+\log k)$, and is the first algorithm to beat $O(n)$ space in high dimension $d$, as all previous algorithms require space at least $\exp(d)$.
Abstract
The Johnson-Lindenstrauss (JL) Lemma introduced the concept of dimension reduction via a random linear map, which has become a fundamental technique in many computational settings. For a set of $n$ points in $\mathbb{R}^d$ and any fixed $ε>0$, it reduces the dimension $d$ to $O(\log n)$ while preserving, with high probability, all the pairwise Euclidean distances within factor $1+ε$. Perhaps surprisingly, the target dimension can be lower if one only wishes to preserve the optimal value of a certain problem on the pointset, e.g., Euclidean max-cut or $k$-means. However, for some notorious problems, like diameter (aka furthest pair), dimension reduction via the JL map to below $O(\log n)$ does not preserve the optimal value within factor $1+ε$. We propose to focus on another regime, of \emph{moderate dimension reduction}, where a problem's value is preserved within factor $α>1$ using target dimension $\tfrac{\log n}{poly(α)}$. We establish the viability of this approach and show that the famous $k$-center problem is $α$-approximated when reducing to dimension $O(\tfrac{\log n}{α^2}+\log k)$. Along the way, we address the diameter problem via the special case $k=1$. Our result extends to several important variants of $k$-center (with outliers, capacities, or fairness constraints), and the bound improves further with the input's doubling dimension. While our $poly(α)$-factor improvement in the dimension may seem small, it actually has significant implications for streaming algorithms, and easily yields an algorithm for $k$-center in dynamic geometric streams, that achieves $O(α)$-approximation using space $poly(kdn^{1/α^2})$. This is the first algorithm to beat $O(n)$ space in high dimension $d$, as all previous algorithms require space at least $\exp(d)$. Furthermore, it extends to the $k$-center variants mentioned above.
