Solving the Correlation Cluster LP in Sublinear Time
Nairen Cao, Vincent Cohen-Addad, Shi Li, Euiwoong Lee, David Rasmussen Lolck, Alantha Newman, Mikkel Thorup, Lukas Vogl, Shuyi Yan, Hanwen Zhang
TL;DR
The paper tackles Correlation Clustering by leveraging the cluster LP, which, despite its exponential size, can be approximated efficiently. It introduces a sublinear-time method to compute a near-optimal LP solution and extends this to a rounding procedure that achieves a $(1.485+\varepsilon)$-approximation for Correlation Clustering, matching state-of-the-art polynomial-time results with significantly reduced running time. Central to the approach are (i) a multiplicative-weights-based solver for a covering reformulation of the cluster LP, (ii) a preclustering step that yields structured atoms and admissible edges to guide clustering, (iii) a partial clustering strategy that iteratively extracts small-ratio clusters to cover a constant fraction of mass, and (iv) scalable MPC and sublinear implementations of both LP solving and rounding. Together, these techniques bridge the gap between high-accuracy approximation algorithms and fast, scalable clustering, enabling practical applications in large-scale data analysis where only sublinear or near-linear runtimes are feasible.
Abstract
Correlation Clustering is a fundamental and widely-studied problem in unsupervised learning and data mining. The input is a graph and the goal is to construct a clustering minimizing the number of inter-cluster edges plus the number of missing intra-cluster edges. CCL+24 introduced the cluster LP for Correlation Clustering, which they argued captures the problem much more succinctly than previous linear programming formulations. However, the cluster LP has exponential size, with a variable for every possible set of vertices in the input graph. Nevertheless, CCL+24 showed how to find a feasible solution for the cluster LP in time $O(n^{\text{poly}(1/ε)})$ with objective value at most $(1+ε)$ times the value of an optimal solution for the respective Correlation Clustering instance. Furthermore, they showed how to round a solution to the cluster LP, yielding a $(1.485+ε)$-approximation algorithm for the Correlation Clustering problem. The main technical result of this paper is a new approach to find a feasible solution for the cluster LP with objective value at most $(1+ε)$ of the optimum in time $\widetilde O(2^{\text{poly}(1/ε)} n)$, where $n$ is the number of vertices in the graph. We also show how to implement the rounding within the same time bounds, thus achieving a fast $(1.485+ε)$-approximation algorithm for the Correlation Clustering problem. This bridges the gap between state-of-the-art methods for approximating Correlation Clustering and the recent focus on fast algorithms.
