Table of Contents
Fetching ...

Round-efficient Fully-scalable MPC algorithms for k-Means

Shaofeng H. -C. Jiang, Yaonan Jin, Jianing Lou, Weicheng Wang

Abstract

We study Euclidean $k$-Means under the Massively Parallel Computation (MPC) model, focusing on the \emph{fully-scalable} setting. Our main result is a fully-scalable $O((\log n/\log\log n)^2)$-approximation in $O(1)$ rounds. Previously, fully-scalable algorithms for $k$-Means either run in super-constant $O(\log\log n \cdot \log\log\log n)$ rounds, albeit with a better $O(1)$-approximation [Cohen-Addad et al., SODA'26], or suffer from bicriteria guarantees [Bhaskara and Wijewardena, ICML'18; Czumaj et al., ICALP'24]. Our algorithm also gives an $O(\log n/\log\log n)$-approximation for $k$-Median, which improves a recent $O(\log n)$-approximation [Goranci et al., SODA'26], and this $o(\log n)$ ratio breaks the fundamental barrier of tree embedding methods used therein. Our main technical contribution is a new variant of the MP algorithm [Mettu and Plaxton, SICOMP'03] that works for general metrics, whose new guarantee is the Lagrangian Multiplier Preserving (LMP) property, which, importantly, holds even under arbitrary distance distortions. Allowing distance distortion is crucial for efficient MPC implementations and useful for efficient algorithm design in general, whereas preserving the LMP property under distance distortion is known to be a significant technical challenge. As a byproduct of our techniques, we also obtain an $O(1)$-approximation to the optimal \emph{value} in $O(1)$ rounds, which conceptually suggests that achieving a true $O(1)$-approximation (for the solution) in $O(1)$ rounds may be a sensible goal for future study.

Round-efficient Fully-scalable MPC algorithms for k-Means

Abstract

We study Euclidean -Means under the Massively Parallel Computation (MPC) model, focusing on the \emph{fully-scalable} setting. Our main result is a fully-scalable -approximation in rounds. Previously, fully-scalable algorithms for -Means either run in super-constant rounds, albeit with a better -approximation [Cohen-Addad et al., SODA'26], or suffer from bicriteria guarantees [Bhaskara and Wijewardena, ICML'18; Czumaj et al., ICALP'24]. Our algorithm also gives an -approximation for -Median, which improves a recent -approximation [Goranci et al., SODA'26], and this ratio breaks the fundamental barrier of tree embedding methods used therein. Our main technical contribution is a new variant of the MP algorithm [Mettu and Plaxton, SICOMP'03] that works for general metrics, whose new guarantee is the Lagrangian Multiplier Preserving (LMP) property, which, importantly, holds even under arbitrary distance distortions. Allowing distance distortion is crucial for efficient MPC implementations and useful for efficient algorithm design in general, whereas preserving the LMP property under distance distortion is known to be a significant technical challenge. As a byproduct of our techniques, we also obtain an -approximation to the optimal \emph{value} in rounds, which conceptually suggests that achieving a true -approximation (for the solution) in rounds may be a sensible goal for future study.

Paper Structure

This paper contains 55 sections, 34 theorems, 95 equations, 8 algorithms.

Key Result

Theorem 1.1

For any constant $\varepsilon\in (0,1)$, there exists an MPC algorithm for $(k,z)$-Clustering that, for any $n$-point dataset from $\mathbb{R}^{O(\log n)}$ distributed across machines with local memory $s \ge \mathop{\mathrm{polylog}}\nolimits n$, computes an $O_{\varepsilon}((\frac{\log n}{\log\log

Theorems & Definitions (97)

  • Theorem 1.1: Implied by \ref{['thm:clustering-solution-formal']}
  • Theorem 1.2: Implied by \ref{['thm:clustering-value-formal']}
  • Example
  • Lemma 2.2: Integrality Gap for Clustering CharikarGTS99
  • Lemma 3.0
  • Corollary 3.1
  • proof
  • proof
  • proof
  • Lemma 3.4
  • ...and 87 more