A Faster Algorithm for Constrained Correlation Clustering

Nick Fischer; Evangelos Kipouridis; Jonas Klausen; Mikkel Thorup

A Faster Algorithm for Constrained Correlation Clustering

Nick Fischer, Evangelos Kipouridis, Jonas Klausen, Mikkel Thorup

TL;DR

This work targets Constrained Correlation Clustering, where hard pairwise constraints must be satisfied while minimizing violated preferences. It introduces a fast combinatorial framework that preprocesses the input to a transformed graph and then applies a PIVOT-based clustering, achieving a 16-approximation in ~O(n^3) time; the authors also derandomize CC-PIVOT and prove an inherent 3-ε barrier for pivot-based methods. The approach extends to Node-Weighted CC, and the paper provides both deterministic and randomized variants with solid theoretical guarantees. The results significantly accelerate constrained clustering, opening avenues for scalable, constraint-aware clustering in practice. Key techniques include a graph transformation that enforces hard constraints, and a charging-LP analysis that yields robust approximation bounds.

Abstract

In the Correlation Clustering problem we are given $n$ nodes, and a preference for each pair of nodes indicating whether we prefer the two endpoints to be in the same cluster or not. The output is a clustering inducing the minimum number of violated preferences. In certain cases, however, the preference between some pairs may be too important to be violated. The constrained version of this problem specifies pairs of nodes that must be in the same cluster as well as pairs that must not be in the same cluster (hard constraints). The output clustering has to satisfy all hard constraints while minimizing the number of violated preferences. Constrained Correlation Clustering is APX-Hard and has been approximated within a factor 3 by van Zuylen et al. [SODA '07] using $Ω(n^{3ω})$ time. In this work, using a more combinatorial approach, we show how to approximate this problem significantly faster at the cost of a slightly weaker approximation factor. In particular, our algorithm runs in $\widetilde{O}(n^3)$ time and approximates Constrained Correlation Clustering within a factor 16. To achieve our result we need properties guaranteed by a particular influential algorithm for (unconstrained) Correlation Clustering, the CC-PIVOT algorithm. This algorithm chooses a pivot node $u$, creates a cluster containing $u$ and all its preferred nodes, and recursively solves the rest of the problem. As a byproduct of our work, we provide a derandomization of the CC-PIVOT algorithm that still achieves the 3-approximation; furthermore, we show that there exist instances where no ordering of the pivots can give a $(3-\varepsilon)$-approximation, for any constant $\varepsilon$. Finally, we introduce a node-weighted version of Correlation Clustering, which can be approximated within factor 3 using our insights on Constrained Correlation Clustering.

A Faster Algorithm for Constrained Correlation Clustering

TL;DR

Abstract

In the Correlation Clustering problem we are given

nodes, and a preference for each pair of nodes indicating whether we prefer the two endpoints to be in the same cluster or not. The output is a clustering inducing the minimum number of violated preferences. In certain cases, however, the preference between some pairs may be too important to be violated. The constrained version of this problem specifies pairs of nodes that must be in the same cluster as well as pairs that must not be in the same cluster (hard constraints). The output clustering has to satisfy all hard constraints while minimizing the number of violated preferences. Constrained Correlation Clustering is APX-Hard and has been approximated within a factor 3 by van Zuylen et al. [SODA '07] using

time. In this work, using a more combinatorial approach, we show how to approximate this problem significantly faster at the cost of a slightly weaker approximation factor. In particular, our algorithm runs in

time and approximates Constrained Correlation Clustering within a factor 16. To achieve our result we need properties guaranteed by a particular influential algorithm for (unconstrained) Correlation Clustering, the CC-PIVOT algorithm. This algorithm chooses a pivot node

, creates a cluster containing

and all its preferred nodes, and recursively solves the rest of the problem. As a byproduct of our work, we provide a derandomization of the CC-PIVOT algorithm that still achieves the 3-approximation; furthermore, we show that there exist instances where no ordering of the pivots can give a

-approximation, for any constant

. Finally, we introduce a node-weighted version of Correlation Clustering, which can be approximated within factor 3 using our insights on Constrained Correlation Clustering.

Paper Structure (14 sections, 31 theorems, 3 equations, 3 figures)

This paper contains 14 sections, 31 theorems, 3 equations, 3 figures.

Introduction
Previous Results
Our Contribution
Overview of Our Techniques
Open Problems
Preliminaries
Combinatorial Algorithms for Constrained Correlation Clustering
PIVOT Algorithms for Correlation Clustering
Lower Bound
Optimal Deterministic PIVOT: 3-Approximation
Analysis of CHARGE
Correctness of CLUSTER
Analysis of Constrained Correlation Clustering
Node-Weighted Correlation Clustering

Key Result

Theorem 1

There is a deterministic algorithm for Constrained Correlation Clustering computing a $16$-approximation in time $\widetilde{O}(n^3)$.

Figures (3)

Figure 1: The primal and dual LP relaxations for Correlation Clustering, which we refer to as the charging LP. $T(G)$ is the set of all bad triplets in $G$.
Figure 2: Illustrates an application of TRANSFORM(G,F,H) (Algorithm \ref{['alg:main']}). In the transformed graph, for any two supernodes $U_1,U_2$, either all pairs with an endpoint in $U_1$ and an endpoint in $U_2$ share an edge, or none of them do. Furthermore, all pairs within a supernode are connected and no hostile supernodes are connected.
Figure 3: The LP relaxation for Node-Weighted Correlation Clustering.

Theorems & Definitions (34)

Theorem 1: Constrained Correlation Clustering
Theorem 2: Deterministic PIVOT
Theorem 3: PIVOT Lower Bound
Theorem 4: Node-Weighted Correlation Clustering, Deterministic
Theorem 5: Node-Weighted Correlation Clustering, Randomized
Definition 6: Correlation Clustering
Definition 7: Constrained Correlation Clustering
Definition 8: Node-Weighted Correlation Clustering
Lemma 9
Lemma 10
...and 24 more

A Faster Algorithm for Constrained Correlation Clustering

TL;DR

Abstract

A Faster Algorithm for Constrained Correlation Clustering

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (3)

Theorems & Definitions (34)