Table of Contents
Fetching ...

Streaming and Massively Parallel Algorithms for Euclidean Max-Cut

Nicolas Menand, Erik Waingarten

TL;DR

The paper develops constant-round MPC and low-space streaming algorithms for Euclidean max-cut that, crucially, output an approximately optimal cut (not just its value) in sublinear computation models. It introduces a unified Parallel and Subsampled Greedy framework built on activation timelines and masks, enabling both insertion-only and dynamic streaming as well as MPC implementations. Central to the results are metric-compatible weights computed via cascaded sketches and geometric sampling, plus a structural theorem ensuring the existence of near-optimal cuts under the Assign procedure. The dynamic-extension uses correlated timeline-masks and geometric samples to maintain oracle-access to a (1+ε)-approximate cut in dynamic streams. Together, these contributions resolve prior open questions about obtaining both value estimates and explicit near-optimal cuts in sublinear geometric settings, with implications for dense and metric max-cut problems in large-scale data processing.

Abstract

Given a set of vectors $X = \{ x_1,\dots, x_n \} \subset \mathbb{R}^d$, the Euclidean max-cut problem asks to partition the vectors into two parts so as to maximize the sum of Euclidean distances which cross the partition. We design new algorithms for Euclidean max-cut in models for massive datasets: $\bullet$ We give a fully-scalable constant-round MPC algorithm using $O(nd) + n \cdot \text{poly}( \log(n) / ε)$ total space which gives a $(1+ε)$-approximate Euclidean max-cut. $\bullet$ We give a dynamic streaming algorithm using $\text{poly}(d \log Δ/ ε)$ space when $X \subseteq [Δ]^d$, which provides oracle access to a $(1+ε)$-approximate Euclidean max-cut. Recently, Chen, Jiang, and Krauthgamer $[\text{STOC}~'23]$ gave a dynamic streaming algorithm with space $\text{poly}(d\logΔ/ε)$ to approximate the value of the Euclidean max-cut, but could not provide oracle access to an approximately optimal cut. This was left open in that work, and we resolve it here. Both algorithms follow from the same framework, which analyzes a ``parallel'' and ``subsampled'' (Euclidean) version of a greedy algorithm of Mathieu and Schudy $[\text{SODA}~'08]$ for dense max-cut.

Streaming and Massively Parallel Algorithms for Euclidean Max-Cut

TL;DR

The paper develops constant-round MPC and low-space streaming algorithms for Euclidean max-cut that, crucially, output an approximately optimal cut (not just its value) in sublinear computation models. It introduces a unified Parallel and Subsampled Greedy framework built on activation timelines and masks, enabling both insertion-only and dynamic streaming as well as MPC implementations. Central to the results are metric-compatible weights computed via cascaded sketches and geometric sampling, plus a structural theorem ensuring the existence of near-optimal cuts under the Assign procedure. The dynamic-extension uses correlated timeline-masks and geometric samples to maintain oracle-access to a (1+ε)-approximate cut in dynamic streams. Together, these contributions resolve prior open questions about obtaining both value estimates and explicit near-optimal cuts in sublinear geometric settings, with implications for dense and metric max-cut problems in large-scale data processing.

Abstract

Given a set of vectors , the Euclidean max-cut problem asks to partition the vectors into two parts so as to maximize the sum of Euclidean distances which cross the partition. We design new algorithms for Euclidean max-cut in models for massive datasets: We give a fully-scalable constant-round MPC algorithm using total space which gives a -approximate Euclidean max-cut. We give a dynamic streaming algorithm using space when , which provides oracle access to a -approximate Euclidean max-cut. Recently, Chen, Jiang, and Krauthgamer gave a dynamic streaming algorithm with space to approximate the value of the Euclidean max-cut, but could not provide oracle access to an approximately optimal cut. This was left open in that work, and we resolve it here. Both algorithms follow from the same framework, which analyzes a ``parallel'' and ``subsampled'' (Euclidean) version of a greedy algorithm of Mathieu and Schudy for dense max-cut.

Paper Structure

This paper contains 58 sections, 38 theorems, 155 equations, 12 figures.

Key Result

Theorem 1

There is a $O(1)$-round fully-scalable MPC algorithm which outputs a $(1+\varepsilon)$-approximate Euclidean max-cut using $O(nd) + n \cdot \mathrm{poly}(\log n / \varepsilon)$ total space.

Figures (12)

  • Figure 1: Representation of the timelines for each point. The time axis is on the bottom, each each horizontal dotted line corresponds to the timeline of each point $x_1, \dots, x_{10}$. Time $t_0$ appears in the timeline with times up to $t_0$ in gray. For a point $x_i$, a dot on the timeline corresponds to an activation, and a solid dot to an activation which is also kept. Up to time $t_0$, every point which is activated is also kept, but after that, points which are activated at time $t$ are kept with probability $\gamma_t$. A point is assigned greedily whenever it is first activated; the assignment depends on the edge weights to the previously "activated" and "kept" points (i.e., those with solid black dots before the time of first activation). Hence, it suffices for algorithm to store points which are simultaneously activated and kept (i.e., those with solid dots).
  • Figure 2: Representation of the timelines for the dynamic streaming algorithm. Right above the time axis on the bottom, we sample one mask $\mathbf{K}$ which determines which times will contain points which are simultaneously activated and kept---these are the solid black dots on $\mathbf{K}$. Times before $t_0$ activate and keep a single point, and after $t_0$, points are kept with probability $\gamma_t$. Whenever $\mathbf{K}_t = 1$ (with a solid black dot), the rectangle above it represents a geometric sampling sketch used to determine which point is activated and kept. After geometric sampling sketches are generated, the horizontal dotted lines represents the (remaining) timelines to be generated for each point.
  • Figure 3: The assignment procedure $\textsc{Assign}_{\sigma}(\cdot,\cdot)$
  • Figure 4: MPC Algorithm for Computing Geometric Weights
  • Figure 5: MPC Algorithm for Euclidean Max-Cut.
  • ...and 7 more figures

Theorems & Definitions (61)

  • Theorem 1: MPC (Informal)
  • Theorem 2: Insertion-Only Streaming (Informal)
  • Theorem 3: Dynamic Streaming (Informal)
  • Definition 2.1: Metric-Compatible Weights
  • Definition 2.2: Activation Timeline
  • Definition 2.3: Masking
  • Definition 2.4: Seed
  • Definition 2.5: Timeline-Mask Summary
  • Theorem 4
  • Theorem 5
  • ...and 51 more