Min-Max Correlation Clustering via Neighborhood Similarity

Nairen Cao; Steven Roche; Hsin-Hao Su

Min-Max Correlation Clustering via Neighborhood Similarity

Nairen Cao, Steven Roche, Hsin-Hao Su

TL;DR

We address the min-max correlation clustering problem on complete graphs with edge labels $+$/$-$ by minimizing the maximum vertex disagreement. The authors introduce a nearly-linear time randomized $(3+\epsilon)$-approximation that runs in $\tilde{O}(|E^{+}|)$ and adapt it to the MPC model with $O(1)$ rounds and sublinear per-machine memory, as well as to the single-pass semi-streaming model with $\tilde{O}(|V|\log n/\epsilon^{2})$ space. The core ideas combine a structural property of optimal instances with neighborhood-similarity testing, using random projection to efficiently identify high-degree clusters and to assign low-degree vertices without inflating the objective. This yields a substantial improvement over the previous 4-approximation and enables scalable clustering for large graphs in parallel and streaming environments, with potential for exact 3-approximation via additional refinements and trade-offs.

Abstract

We present an efficient algorithm for the min-max correlation clustering problem. The input is a complete graph where edges are labeled as either positive $(+)$ or negative $(-)$, and the objective is to find a clustering that minimizes the $\ell_{\infty}$-norm of the disagreement vector over all vertices. We resolve this problem with an efficient $(3 + ε)$-approximation algorithm that runs in nearly linear time, $\tilde{O}(|E^+|)$, where $|E^+|$ denotes the number of positive edges. This improves upon the previous best-known approximation guarantee of 4 by Heidrich, Irmai, and Andres, whose algorithm runs in $O(|V|^2 + |V| D^2)$ time, where $|V|$ is the number of nodes and $D$ is the maximum degree in the graph. Furthermore, we extend our algorithm to the massively parallel computation (MPC) model and the semi-streaming model. In the MPC model, our algorithm runs on machines with memory sublinear in the number of nodes and takes $O(1)$ rounds. In the streaming model, our algorithm requires only $\tilde{O}(|V|)$ space, where $|V|$ is the number of vertices in the graph. Our algorithms are purely combinatorial. They are based on a novel structural observation about the optimal min-max instance, which enables the construction of a $(3 + ε)$-approximation algorithm using $O(|E^+|)$ neighborhood similarity queries. By leveraging random projection, we further show these queries can be computed in nearly linear time.

Min-Max Correlation Clustering via Neighborhood Similarity

TL;DR

We address the min-max correlation clustering problem on complete graphs with edge labels

by minimizing the maximum vertex disagreement. The authors introduce a nearly-linear time randomized

-approximation that runs in

and adapt it to the MPC model with

rounds and sublinear per-machine memory, as well as to the single-pass semi-streaming model with

space. The core ideas combine a structural property of optimal instances with neighborhood-similarity testing, using random projection to efficiently identify high-degree clusters and to assign low-degree vertices without inflating the objective. This yields a substantial improvement over the previous 4-approximation and enables scalable clustering for large graphs in parallel and streaming environments, with potential for exact 3-approximation via additional refinements and trade-offs.

Abstract

We present an efficient algorithm for the min-max correlation clustering problem. The input is a complete graph where edges are labeled as either positive

or negative

, and the objective is to find a clustering that minimizes the

-norm of the disagreement vector over all vertices. We resolve this problem with an efficient

-approximation algorithm that runs in nearly linear time,

, where

denotes the number of positive edges. This improves upon the previous best-known approximation guarantee of 4 by Heidrich, Irmai, and Andres, whose algorithm runs in

time, where

is the number of nodes and

is the maximum degree in the graph. Furthermore, we extend our algorithm to the massively parallel computation (MPC) model and the semi-streaming model. In the MPC model, our algorithm runs on machines with memory sublinear in the number of nodes and takes

rounds. In the streaming model, our algorithm requires only

space, where

is the number of vertices in the graph. Our algorithms are purely combinatorial. They are based on a novel structural observation about the optimal min-max instance, which enables the construction of a

-approximation algorithm using

neighborhood similarity queries. By leveraging random projection, we further show these queries can be computed in nearly linear time.

Min-Max Correlation Clustering via Neighborhood Similarity

TL;DR

Abstract

Min-Max Correlation Clustering via Neighborhood Similarity

TL;DR

Abstract

Paper Structure

Table of Contents

Key Result

Figures (2)

Theorems & Definitions (58)