Table of Contents
Fetching ...

Fully Scalable MPC Algorithms for Euclidean k-Center

Artur Czumaj, Guichen Gao, Mohsen Ghaffari, Shaofeng H. -C. Jiang

TL;DR

The paper tackles scalable solutions for Euclidean k-Center in the fully scalable MPC regime, addressing both low- and high-dimensional settings. It reduces k-Center to geometric RS and MDS problems, and then designs constant-round, fully scalable MPC algorithms using geometric hashing to exploit Euclidean structure. In low dimensions, it achieves constant-round $(2+ ext{ε})$- and $(1+ ext{ε},1+ ext{ε})$-approximations (the latter with $(1+ ext{ε})k$ centers); in higher dimensions, it delivers a constant-round $Oigl( rac{ ext{log} n}{ ext{log} ext{log} n}igr)$-approximation. The work advances the state of fully scalable MPC for clustering by combining RS/MDS reductions with novel geometric hashing and one-round Luby-style techniques, providing both theoretical guarantees and practical memory-round tradeoffs for large-scale Euclidean data.

Abstract

The $k$-center problem is a fundamental optimization problem with numerous applications in machine learning, data analysis, data mining, and communication networks. The $k$-center problem has been extensively studied in the classical sequential setting for several decades, and more recently there have been some efforts in understanding the problem in parallel computing, on the Massively Parallel Computation (MPC) model. For now, we have a good understanding of $k$-center in the case where each local MPC machine has sufficient local memory to store some representatives from each cluster, that is, when one has $Ω(k)$ local memory per machine. While this setting covers the case of small values of $k$, for a large number of clusters these algorithms require undesirably large local memory, making them poorly scalable. The case of large $k$ has been considered only recently for the fully scalable low-local-memory MPC model for the Euclidean instances of the $k$-center problem. However, the earlier works have been considering only the constant dimensional Euclidean space, required a super-constant number of rounds, and produced only $k(1+o(1))$ centers whose cost is a super-constant approximation of $k$-center. In this work, we significantly improve upon the earlier results for the $k$-center problem for the fully scalable low-local-memory MPC model. In the low dimensional Euclidean case in $\mathbb{R}^d$, we present the first constant-round fully scalable MPC algorithm for $(2+\varepsilon)$-approximation. We push the ratio further to $(1 + \varepsilon)$-approximation albeit using slightly more $(1 + \varepsilon)k$ centers. All these results naturally extends to slightly super-constant values of $d$. In the high-dimensional regime, we provide the first fully scalable MPC algorithm that in a constant number of rounds achieves an $O(\log n/ \log \log n)$-approximation for $k$-center.

Fully Scalable MPC Algorithms for Euclidean k-Center

TL;DR

The paper tackles scalable solutions for Euclidean k-Center in the fully scalable MPC regime, addressing both low- and high-dimensional settings. It reduces k-Center to geometric RS and MDS problems, and then designs constant-round, fully scalable MPC algorithms using geometric hashing to exploit Euclidean structure. In low dimensions, it achieves constant-round - and -approximations (the latter with centers); in higher dimensions, it delivers a constant-round -approximation. The work advances the state of fully scalable MPC for clustering by combining RS/MDS reductions with novel geometric hashing and one-round Luby-style techniques, providing both theoretical guarantees and practical memory-round tradeoffs for large-scale Euclidean data.

Abstract

The -center problem is a fundamental optimization problem with numerous applications in machine learning, data analysis, data mining, and communication networks. The -center problem has been extensively studied in the classical sequential setting for several decades, and more recently there have been some efforts in understanding the problem in parallel computing, on the Massively Parallel Computation (MPC) model. For now, we have a good understanding of -center in the case where each local MPC machine has sufficient local memory to store some representatives from each cluster, that is, when one has local memory per machine. While this setting covers the case of small values of , for a large number of clusters these algorithms require undesirably large local memory, making them poorly scalable. The case of large has been considered only recently for the fully scalable low-local-memory MPC model for the Euclidean instances of the -center problem. However, the earlier works have been considering only the constant dimensional Euclidean space, required a super-constant number of rounds, and produced only centers whose cost is a super-constant approximation of -center. In this work, we significantly improve upon the earlier results for the -center problem for the fully scalable low-local-memory MPC model. In the low dimensional Euclidean case in , we present the first constant-round fully scalable MPC algorithm for -approximation. We push the ratio further to -approximation albeit using slightly more centers. All these results naturally extends to slightly super-constant values of . In the high-dimensional regime, we provide the first fully scalable MPC algorithm that in a constant number of rounds achieves an -approximation for -center.

Paper Structure

This paper contains 59 sections, 26 theorems, 42 equations, 1 figure, 1 table, 6 algorithms.

Key Result

Theorem 1.1

There exists an MPC algorithm that given $\varepsilon\in (0,1)$, $k \ge 1$, and a dataset $P \subset \mathbb{R}^d$ of $n$ points distributed across MPC machines with local memory $s \geq (\Omega(d \varepsilon^{-1}))^{\Omega(d)} \mathop{\mathrm{poly}}\nolimits\log n$, with probability at least $1-1/n

Figures (1)

  • Figure 1: A space partition in 2D with $T = 3$ and $\alpha = 5$. The first group is squares with side-length $5\tau/\sqrt{2}$ (the blank space), the second is the red-shaded rectangles, and the third is the cross-like structures in blue shades.

Theorems & Definitions (51)

  • Theorem 1.1
  • Theorem 1.2
  • Theorem 1.3
  • Lemma 2.1: Packing property, cf. pollard1990empirical
  • Lemma 3.0
  • proof
  • Definition 3.2: CJKVY22
  • Lemma 3.3: CJKVY22
  • Lemma 4.1
  • Lemma 4.2
  • ...and 41 more