Fully Scalable MPC Algorithms for Euclidean k-Center
Artur Czumaj, Guichen Gao, Mohsen Ghaffari, Shaofeng H. -C. Jiang
TL;DR
The paper tackles scalable solutions for Euclidean k-Center in the fully scalable MPC regime, addressing both low- and high-dimensional settings. It reduces k-Center to geometric RS and MDS problems, and then designs constant-round, fully scalable MPC algorithms using geometric hashing to exploit Euclidean structure. In low dimensions, it achieves constant-round $(2+ ext{ε})$- and $(1+ ext{ε},1+ ext{ε})$-approximations (the latter with $(1+ ext{ε})k$ centers); in higher dimensions, it delivers a constant-round $Oigl(rac{ ext{log} n}{ ext{log} ext{log} n}igr)$-approximation. The work advances the state of fully scalable MPC for clustering by combining RS/MDS reductions with novel geometric hashing and one-round Luby-style techniques, providing both theoretical guarantees and practical memory-round tradeoffs for large-scale Euclidean data.
Abstract
The $k$-center problem is a fundamental optimization problem with numerous applications in machine learning, data analysis, data mining, and communication networks. The $k$-center problem has been extensively studied in the classical sequential setting for several decades, and more recently there have been some efforts in understanding the problem in parallel computing, on the Massively Parallel Computation (MPC) model. For now, we have a good understanding of $k$-center in the case where each local MPC machine has sufficient local memory to store some representatives from each cluster, that is, when one has $Ω(k)$ local memory per machine. While this setting covers the case of small values of $k$, for a large number of clusters these algorithms require undesirably large local memory, making them poorly scalable. The case of large $k$ has been considered only recently for the fully scalable low-local-memory MPC model for the Euclidean instances of the $k$-center problem. However, the earlier works have been considering only the constant dimensional Euclidean space, required a super-constant number of rounds, and produced only $k(1+o(1))$ centers whose cost is a super-constant approximation of $k$-center. In this work, we significantly improve upon the earlier results for the $k$-center problem for the fully scalable low-local-memory MPC model. In the low dimensional Euclidean case in $\mathbb{R}^d$, we present the first constant-round fully scalable MPC algorithm for $(2+\varepsilon)$-approximation. We push the ratio further to $(1 + \varepsilon)$-approximation albeit using slightly more $(1 + \varepsilon)k$ centers. All these results naturally extends to slightly super-constant values of $d$. In the high-dimensional regime, we provide the first fully scalable MPC algorithm that in a constant number of rounds achieves an $O(\log n/ \log \log n)$-approximation for $k$-center.
