Table of Contents
Fetching ...

Min-Max Correlation Clustering via Neighborhood Similarity

Nairen Cao, Steven Roche, Hsin-Hao Su

TL;DR

We address the min-max correlation clustering problem on complete graphs with edge labels $+$/$-$ by minimizing the maximum vertex disagreement. The authors introduce a nearly-linear time randomized $(3+\epsilon)$-approximation that runs in $\tilde{O}(|E^{+}|)$ and adapt it to the MPC model with $O(1)$ rounds and sublinear per-machine memory, as well as to the single-pass semi-streaming model with $\tilde{O}(|V|\log n/\epsilon^{2})$ space. The core ideas combine a structural property of optimal instances with neighborhood-similarity testing, using random projection to efficiently identify high-degree clusters and to assign low-degree vertices without inflating the objective. This yields a substantial improvement over the previous 4-approximation and enables scalable clustering for large graphs in parallel and streaming environments, with potential for exact 3-approximation via additional refinements and trade-offs.

Abstract

We present an efficient algorithm for the min-max correlation clustering problem. The input is a complete graph where edges are labeled as either positive $(+)$ or negative $(-)$, and the objective is to find a clustering that minimizes the $\ell_{\infty}$-norm of the disagreement vector over all vertices. We resolve this problem with an efficient $(3 + ε)$-approximation algorithm that runs in nearly linear time, $\tilde{O}(|E^+|)$, where $|E^+|$ denotes the number of positive edges. This improves upon the previous best-known approximation guarantee of 4 by Heidrich, Irmai, and Andres, whose algorithm runs in $O(|V|^2 + |V| D^2)$ time, where $|V|$ is the number of nodes and $D$ is the maximum degree in the graph. Furthermore, we extend our algorithm to the massively parallel computation (MPC) model and the semi-streaming model. In the MPC model, our algorithm runs on machines with memory sublinear in the number of nodes and takes $O(1)$ rounds. In the streaming model, our algorithm requires only $\tilde{O}(|V|)$ space, where $|V|$ is the number of vertices in the graph. Our algorithms are purely combinatorial. They are based on a novel structural observation about the optimal min-max instance, which enables the construction of a $(3 + ε)$-approximation algorithm using $O(|E^+|)$ neighborhood similarity queries. By leveraging random projection, we further show these queries can be computed in nearly linear time.

Min-Max Correlation Clustering via Neighborhood Similarity

TL;DR

We address the min-max correlation clustering problem on complete graphs with edge labels / by minimizing the maximum vertex disagreement. The authors introduce a nearly-linear time randomized -approximation that runs in and adapt it to the MPC model with rounds and sublinear per-machine memory, as well as to the single-pass semi-streaming model with space. The core ideas combine a structural property of optimal instances with neighborhood-similarity testing, using random projection to efficiently identify high-degree clusters and to assign low-degree vertices without inflating the objective. This yields a substantial improvement over the previous 4-approximation and enables scalable clustering for large graphs in parallel and streaming environments, with potential for exact 3-approximation via additional refinements and trade-offs.

Abstract

We present an efficient algorithm for the min-max correlation clustering problem. The input is a complete graph where edges are labeled as either positive or negative , and the objective is to find a clustering that minimizes the -norm of the disagreement vector over all vertices. We resolve this problem with an efficient -approximation algorithm that runs in nearly linear time, , where denotes the number of positive edges. This improves upon the previous best-known approximation guarantee of 4 by Heidrich, Irmai, and Andres, whose algorithm runs in time, where is the number of nodes and is the maximum degree in the graph. Furthermore, we extend our algorithm to the massively parallel computation (MPC) model and the semi-streaming model. In the MPC model, our algorithm runs on machines with memory sublinear in the number of nodes and takes rounds. In the streaming model, our algorithm requires only space, where is the number of vertices in the graph. Our algorithms are purely combinatorial. They are based on a novel structural observation about the optimal min-max instance, which enables the construction of a -approximation algorithm using neighborhood similarity queries. By leveraging random projection, we further show these queries can be computed in nearly linear time.

Paper Structure

This paper contains 27 sections, 27 theorems, 39 equations, 2 figures, 4 algorithms.

Key Result

theorem 1.1

Let $G = (V, E^+)$ be a min-max correlation clustering instance, $\epsilon > 0$ be a small constant, and $\mathrm{OPT}$ be the value of the optimal solution. In the following models, there exist randomized algorithms that output a clustering $\mathcal{C}$ with $\mathrm{obj}(\mathcal{C}) \leq (3+\eps

Figures (2)

  • Figure 1: A pictorial illustration of the proof of \ref{['lem:nostealingeasy']} when $\eta = 0$
  • Figure 2: A pictorial illustration of the proof of \ref{['lem:largeintersection']} when $\eta = 0$

Theorems & Definitions (58)

  • theorem 1.1
  • Corollary 1.1
  • definition 2.1
  • definition 2.2
  • lemma 2.3: Triangle Inequality halmos1960naive
  • definition 2.4
  • definition 2.5
  • definition 2.6
  • definition 3.1
  • definition 3.2
  • ...and 48 more