A Faster Algorithm for Constrained Correlation Clustering
Nick Fischer, Evangelos Kipouridis, Jonas Klausen, Mikkel Thorup
TL;DR
This work targets Constrained Correlation Clustering, where hard pairwise constraints must be satisfied while minimizing violated preferences. It introduces a fast combinatorial framework that preprocesses the input to a transformed graph and then applies a PIVOT-based clustering, achieving a 16-approximation in ~O(n^3) time; the authors also derandomize CC-PIVOT and prove an inherent 3-ε barrier for pivot-based methods. The approach extends to Node-Weighted CC, and the paper provides both deterministic and randomized variants with solid theoretical guarantees. The results significantly accelerate constrained clustering, opening avenues for scalable, constraint-aware clustering in practice. Key techniques include a graph transformation that enforces hard constraints, and a charging-LP analysis that yields robust approximation bounds.
Abstract
In the Correlation Clustering problem we are given $n$ nodes, and a preference for each pair of nodes indicating whether we prefer the two endpoints to be in the same cluster or not. The output is a clustering inducing the minimum number of violated preferences. In certain cases, however, the preference between some pairs may be too important to be violated. The constrained version of this problem specifies pairs of nodes that must be in the same cluster as well as pairs that must not be in the same cluster (hard constraints). The output clustering has to satisfy all hard constraints while minimizing the number of violated preferences. Constrained Correlation Clustering is APX-Hard and has been approximated within a factor 3 by van Zuylen et al. [SODA '07] using $Ω(n^{3ω})$ time. In this work, using a more combinatorial approach, we show how to approximate this problem significantly faster at the cost of a slightly weaker approximation factor. In particular, our algorithm runs in $\widetilde{O}(n^3)$ time and approximates Constrained Correlation Clustering within a factor 16. To achieve our result we need properties guaranteed by a particular influential algorithm for (unconstrained) Correlation Clustering, the CC-PIVOT algorithm. This algorithm chooses a pivot node $u$, creates a cluster containing $u$ and all its preferred nodes, and recursively solves the rest of the problem. As a byproduct of our work, we provide a derandomization of the CC-PIVOT algorithm that still achieves the 3-approximation; furthermore, we show that there exist instances where no ordering of the pivots can give a $(3-\varepsilon)$-approximation, for any constant $\varepsilon$. Finally, we introduce a node-weighted version of Correlation Clustering, which can be approximated within factor 3 using our insights on Constrained Correlation Clustering.
