Fast approximate $\ell$-center clustering in high dimensional spaces
Mirosław Kowaluk, Andrzej Lingas, Mia Persson
TL;DR
This work introduces a randomized dimension-reduction framework to accelerate high-dimensional $ll$-center clustering and minimum-diameter $ll$-clustering in Euclidean and Hamming spaces. By projecting data to $k=O( log n/psilon^2)$ dimensions while preserving distances, it enables conservative solvers to operate efficiently in reduced space and then lift solutions back to the original space with controlled loss in accuracy. The authors obtain $(2+psilon)$-approximation algorithms that outperform classic $2$-approximation methods for large $ll$ and dimension, extend the framework to fast $O(lpha)$- and $O(1)$-approximation schemes (including outliers), and provide time bounds that substantially reduce dependence on the ambient dimension. They also outline how recent subspace results could further improve runtime (with modest increases in approximation). Overall, the paper advances scalable clustering in very high dimensions by marrying randomized projections with existing approximation techniques.
Abstract
We study the design of efficient approximation algorithms for the $\ell$-center clustering and minimum-diameter $\ell$-clustering problems in high dimensional Euclidean and Hamming spaces. Our main tool is randomized dimension reduction. First, we present a general method of reducing the dependency of the running time of a hypothetical algorithm for the $\ell$-center problem in a high dimensional Euclidean space on the dimension size. Utilizing in part this method, we provide $(2+ε)$- approximation algorithms for the $\ell$-center clustering and minimum-diameter $\ell$-clustering problems in Euclidean and Hamming spaces that are substantially faster than the known $2$-approximation ones when both $\ell$ and the dimension are super-logarithmic. Next, we apply the general method to the recent fast approximation algorithms with higher approximation guarantees for the $\ell$-center clustering problem in a high dimensional Euclidean space. Finally, we provide a speed-up of the known $O(1)$-approximation method for the generalization of the $\ell$-center clustering problem to include $z$ outliers (i.e., $z$ input points can be ignored while computing the maximum distance of an input point to a center) in high dimensional Euclidean and Hamming spaces.
