Table of Contents
Fetching ...

Pruned Pivot: Correlation Clustering Algorithm for Dynamic, Parallel, and Local Computation Models

Mina Dalirrooyfard, Konstantin Makarychev, Slobodan Mitrović

TL;DR

This work introduces Pruned Pivot, a Pivot-like algorithm for correlation clustering on unweighted graphs that achieves a $3+\varepsilon$ approximation while enabling scalable implementations in dynamic, MPC, and LCA models. It develops a depth-bounded recursive formulation whose pruning yields near-optimal tradeoffs, and provides comprehensive analyses (including dangerous and expensive query-path concepts and martingale bounds) that support the approximation and efficiency claims. The paper delivers concrete algorithmic results: a fully dynamic algorithm with expected amortized update time $O(1/\varepsilon)$, an MPC algorithm with $O(\log(1/\varepsilon))$ rounds, an LCA with $O(\Delta/\varepsilon)$ probes, and a CRCW PRAM implementation in $O(1/\varepsilon)$ rounds. Experimental evaluation suggests that exploring only a small number of nodes suffices to achieve near-Pivot performance, highlighting the approach’s practicality for large, evolving graphs. Overall, the method substantially improves the scalability of correlation clustering across dynamic, parallel, and local computation models, while preserving strong approximation guarantees.

Abstract

Given a graph with positive and negative edge labels, the correlation clustering problem aims to cluster the nodes so to minimize the total number of between-cluster positive and within-cluster negative edges. This problem has many applications in data mining, particularly in unsupervised learning. Inspired by the prevalence of large graphs and constantly changing data in modern applications, we study correlation clustering in dynamic, parallel (MPC), and local computation (LCA) settings. We design an approach that improves state-of-the-art runtime complexities in all these settings. In particular, we provide the first fully dynamic algorithm that runs in an expected amortized constant time, without any dependence on the graph size. Moreover, our algorithm essentially matches the approximation guarantee of the celebrated Pivot algorithm.

Pruned Pivot: Correlation Clustering Algorithm for Dynamic, Parallel, and Local Computation Models

TL;DR

This work introduces Pruned Pivot, a Pivot-like algorithm for correlation clustering on unweighted graphs that achieves a approximation while enabling scalable implementations in dynamic, MPC, and LCA models. It develops a depth-bounded recursive formulation whose pruning yields near-optimal tradeoffs, and provides comprehensive analyses (including dangerous and expensive query-path concepts and martingale bounds) that support the approximation and efficiency claims. The paper delivers concrete algorithmic results: a fully dynamic algorithm with expected amortized update time , an MPC algorithm with rounds, an LCA with probes, and a CRCW PRAM implementation in rounds. Experimental evaluation suggests that exploring only a small number of nodes suffices to achieve near-Pivot performance, highlighting the approach’s practicality for large, evolving graphs. Overall, the method substantially improves the scalability of correlation clustering across dynamic, parallel, and local computation models, while preserving strong approximation guarantees.

Abstract

Given a graph with positive and negative edge labels, the correlation clustering problem aims to cluster the nodes so to minimize the total number of between-cluster positive and within-cluster negative edges. This problem has many applications in data mining, particularly in unsupervised learning. Inspired by the prevalence of large graphs and constantly changing data in modern applications, we study correlation clustering in dynamic, parallel (MPC), and local computation (LCA) settings. We design an approach that improves state-of-the-art runtime complexities in all these settings. In particular, we provide the first fully dynamic algorithm that runs in an expected amortized constant time, without any dependence on the graph size. Moreover, our algorithm essentially matches the approximation guarantee of the celebrated Pivot algorithm.
Paper Structure (30 sections, 15 theorems, 35 equations, 8 figures, 7 algorithms)

This paper contains 30 sections, 15 theorems, 35 equations, 8 figures, 7 algorithms.

Key Result

Theorem 1.1

For any $\varepsilon>0$, there is a data structure that maintains a $3+\varepsilon$ approximation of correlation clustering in a fully-dynamic setting with an oblivious adversary. The expected update time is $O(1/\varepsilon)$ per operation.

Figures (8)

  • Figure 1: This figure shows an extended query path in the recursion tree $\mathcal{T}_v$ for node $v$. The path starts with edge $(a,b)$ goes to the root of the tree, node $v$, and then proceeds to node $w$. The path from $a$ till $v$ is a query path. The path from $a$ to $w$extends the path from $a$ to $v$. If edge $(a,b)$ is cut by the pivot step of Pivot but edge $(v,w)$ is not cut, then this path is expensive. We call it expensive because if $v$ is unlucky, then $(v,w)$ is cut by the pruning step of Pivot with Pruning and the cost of $(v,w)$ is partially charged to this path.
  • Figure 2: Illustration for the proof of Theorem \ref{['lem:martingale']}. Path $(a,b,\dots,u_{L-1},u_L)$ is a dangerous EQ-path at iteration $t$. At iteration $t+1$, it may become a query path and/or an expensive EQ path. It may also get extended to EQ-paths $Pw$, where $w\in W_t\setminus\{u_{L-1},u_L\}$. These extended paths $Pw$ may be dangerous or expensive at iteration $t+1$, but they also may be non-dangerous and non-expensive at iteration $t+1$.
  • Figure 3: Illustration for the proof of Theorem \ref{['lem:martingale']}. Path $(a,b,\dots,u_{L-1},u_L)$ is a dangerous EQ-path at iteration $t$. Set $W_t^{(1)}$ contains nodes in $W_t$ that are not neighbors of $u_{L-1}$. $W_t^{(2)}$ contains nodes in $W_t$ that are neighbors of $u_{L-1}$.
  • Figure 4: Path $(u_1,\dots,u_{L-1},u_L)$ is a query path if each $u_i$ ($i>1$) queries $u_{i-1}$. This condition is equivalent to $\sigma(u_1)\leq \pi(u_1) \leq \cdots \leq \sigma(u_L) \leq \pi(u_L)$. Note that $\sigma(u)\leq \pi(u)$ for every node $u\in V$. Set $\mathcal{Q}_t(u_1,u_2)$ contains this path for $t\geq \pi(u_{L-1})$.
  • Figure 5: Path $(u_1,\dots,u_{L-1},u_L)$ is an extended query path (EQ-path) if each vertex on the path, except for the first and last one, queries the previous vertex and $\pi(u_{L-2})\leq \sigma(u_L)$. This condition is equivalent to $\sigma(u_1)\leq \pi(u_1) \leq \cdots\leq \pi(u_{L-2}) \leq \min(\sigma(u_{L-1}), \sigma(u_{L}))$. The last inequality says that neither $u_{L-1}$ nor $u_{L}$ is settled before $u_{L-2}$ is processed.
  • ...and 3 more figures

Theorems & Definitions (29)

  • Theorem 1.1: Fully-dynamic correlation clustering
  • Theorem 1.2: Correlation clustering in MPC
  • Theorem 1.3: Correlation clustering in LCA
  • Theorem 4.1
  • Lemma 4.2
  • Definition 4.3: Query Paths
  • Definition 4.4: Extended Query Paths
  • Definition 4.5: Expensive Extended Query Paths
  • Lemma 4.6
  • proof
  • ...and 19 more