Min-Max Correlation Clustering via Neighborhood Similarity
Nairen Cao, Steven Roche, Hsin-Hao Su
TL;DR
We address the min-max correlation clustering problem on complete graphs with edge labels $+$/$-$ by minimizing the maximum vertex disagreement. The authors introduce a nearly-linear time randomized $(3+\epsilon)$-approximation that runs in $\tilde{O}(|E^{+}|)$ and adapt it to the MPC model with $O(1)$ rounds and sublinear per-machine memory, as well as to the single-pass semi-streaming model with $\tilde{O}(|V|\log n/\epsilon^{2})$ space. The core ideas combine a structural property of optimal instances with neighborhood-similarity testing, using random projection to efficiently identify high-degree clusters and to assign low-degree vertices without inflating the objective. This yields a substantial improvement over the previous 4-approximation and enables scalable clustering for large graphs in parallel and streaming environments, with potential for exact 3-approximation via additional refinements and trade-offs.
Abstract
We present an efficient algorithm for the min-max correlation clustering problem. The input is a complete graph where edges are labeled as either positive $(+)$ or negative $(-)$, and the objective is to find a clustering that minimizes the $\ell_{\infty}$-norm of the disagreement vector over all vertices. We resolve this problem with an efficient $(3 + ε)$-approximation algorithm that runs in nearly linear time, $\tilde{O}(|E^+|)$, where $|E^+|$ denotes the number of positive edges. This improves upon the previous best-known approximation guarantee of 4 by Heidrich, Irmai, and Andres, whose algorithm runs in $O(|V|^2 + |V| D^2)$ time, where $|V|$ is the number of nodes and $D$ is the maximum degree in the graph. Furthermore, we extend our algorithm to the massively parallel computation (MPC) model and the semi-streaming model. In the MPC model, our algorithm runs on machines with memory sublinear in the number of nodes and takes $O(1)$ rounds. In the streaming model, our algorithm requires only $\tilde{O}(|V|)$ space, where $|V|$ is the number of vertices in the graph. Our algorithms are purely combinatorial. They are based on a novel structural observation about the optimal min-max instance, which enables the construction of a $(3 + ε)$-approximation algorithm using $O(|E^+|)$ neighborhood similarity queries. By leveraging random projection, we further show these queries can be computed in nearly linear time.
