Streaming and Massively Parallel Algorithms for Euclidean Max-Cut
Nicolas Menand, Erik Waingarten
TL;DR
The paper develops constant-round MPC and low-space streaming algorithms for Euclidean max-cut that, crucially, output an approximately optimal cut (not just its value) in sublinear computation models. It introduces a unified Parallel and Subsampled Greedy framework built on activation timelines and masks, enabling both insertion-only and dynamic streaming as well as MPC implementations. Central to the results are metric-compatible weights computed via cascaded sketches and geometric sampling, plus a structural theorem ensuring the existence of near-optimal cuts under the Assign procedure. The dynamic-extension uses correlated timeline-masks and geometric samples to maintain oracle-access to a (1+ε)-approximate cut in dynamic streams. Together, these contributions resolve prior open questions about obtaining both value estimates and explicit near-optimal cuts in sublinear geometric settings, with implications for dense and metric max-cut problems in large-scale data processing.
Abstract
Given a set of vectors $X = \{ x_1,\dots, x_n \} \subset \mathbb{R}^d$, the Euclidean max-cut problem asks to partition the vectors into two parts so as to maximize the sum of Euclidean distances which cross the partition. We design new algorithms for Euclidean max-cut in models for massive datasets: $\bullet$ We give a fully-scalable constant-round MPC algorithm using $O(nd) + n \cdot \text{poly}( \log(n) / ε)$ total space which gives a $(1+ε)$-approximate Euclidean max-cut. $\bullet$ We give a dynamic streaming algorithm using $\text{poly}(d \log Δ/ ε)$ space when $X \subseteq [Δ]^d$, which provides oracle access to a $(1+ε)$-approximate Euclidean max-cut. Recently, Chen, Jiang, and Krauthgamer $[\text{STOC}~'23]$ gave a dynamic streaming algorithm with space $\text{poly}(d\logΔ/ε)$ to approximate the value of the Euclidean max-cut, but could not provide oracle access to an approximately optimal cut. This was left open in that work, and we resolve it here. Both algorithms follow from the same framework, which analyzes a ``parallel'' and ``subsampled'' (Euclidean) version of a greedy algorithm of Mathieu and Schudy $[\text{SODA}~'08]$ for dense max-cut.
