Solving the Correlation Cluster LP in Sublinear Time

Nairen Cao; Vincent Cohen-Addad; Shi Li; Euiwoong Lee; David Rasmussen Lolck; Alantha Newman; Mikkel Thorup; Lukas Vogl; Shuyi Yan; Hanwen Zhang

Solving the Correlation Cluster LP in Sublinear Time

Nairen Cao, Vincent Cohen-Addad, Shi Li, Euiwoong Lee, David Rasmussen Lolck, Alantha Newman, Mikkel Thorup, Lukas Vogl, Shuyi Yan, Hanwen Zhang

TL;DR

The paper tackles Correlation Clustering by leveraging the cluster LP, which, despite its exponential size, can be approximated efficiently. It introduces a sublinear-time method to compute a near-optimal LP solution and extends this to a rounding procedure that achieves a $(1.485+\varepsilon)$-approximation for Correlation Clustering, matching state-of-the-art polynomial-time results with significantly reduced running time. Central to the approach are (i) a multiplicative-weights-based solver for a covering reformulation of the cluster LP, (ii) a preclustering step that yields structured atoms and admissible edges to guide clustering, (iii) a partial clustering strategy that iteratively extracts small-ratio clusters to cover a constant fraction of mass, and (iv) scalable MPC and sublinear implementations of both LP solving and rounding. Together, these techniques bridge the gap between high-accuracy approximation algorithms and fast, scalable clustering, enabling practical applications in large-scale data analysis where only sublinear or near-linear runtimes are feasible.

Abstract

Correlation Clustering is a fundamental and widely-studied problem in unsupervised learning and data mining. The input is a graph and the goal is to construct a clustering minimizing the number of inter-cluster edges plus the number of missing intra-cluster edges. CCL+24 introduced the cluster LP for Correlation Clustering, which they argued captures the problem much more succinctly than previous linear programming formulations. However, the cluster LP has exponential size, with a variable for every possible set of vertices in the input graph. Nevertheless, CCL+24 showed how to find a feasible solution for the cluster LP in time $O(n^{\text{poly}(1/ε)})$ with objective value at most $(1+ε)$ times the value of an optimal solution for the respective Correlation Clustering instance. Furthermore, they showed how to round a solution to the cluster LP, yielding a $(1.485+ε)$-approximation algorithm for the Correlation Clustering problem. The main technical result of this paper is a new approach to find a feasible solution for the cluster LP with objective value at most $(1+ε)$ of the optimum in time $\widetilde O(2^{\text{poly}(1/ε)} n)$, where $n$ is the number of vertices in the graph. We also show how to implement the rounding within the same time bounds, thus achieving a fast $(1.485+ε)$-approximation algorithm for the Correlation Clustering problem. This bridges the gap between state-of-the-art methods for approximating Correlation Clustering and the recent focus on fast algorithms.

Solving the Correlation Cluster LP in Sublinear Time

TL;DR

-approximation for Correlation Clustering, matching state-of-the-art polynomial-time results with significantly reduced running time. Central to the approach are (i) a multiplicative-weights-based solver for a covering reformulation of the cluster LP, (ii) a preclustering step that yields structured atoms and admissible edges to guide clustering, (iii) a partial clustering strategy that iteratively extracts small-ratio clusters to cover a constant fraction of mass, and (iv) scalable MPC and sublinear implementations of both LP solving and rounding. Together, these techniques bridge the gap between high-accuracy approximation algorithms and fast, scalable clustering, enabling practical applications in large-scale data analysis where only sublinear or near-linear runtimes are feasible.

Abstract

with objective value at most

times the value of an optimal solution for the respective Correlation Clustering instance. Furthermore, they showed how to round a solution to the cluster LP, yielding a

-approximation algorithm for the Correlation Clustering problem. The main technical result of this paper is a new approach to find a feasible solution for the cluster LP with objective value at most

of the optimum in time

, where

is the number of vertices in the graph. We also show how to implement the rounding within the same time bounds, thus achieving a fast

-approximation algorithm for the Correlation Clustering problem. This bridges the gap between state-of-the-art methods for approximating Correlation Clustering and the recent focus on fast algorithms.

Solving the Correlation Cluster LP in Sublinear Time

TL;DR

Abstract

Solving the Correlation Cluster LP in Sublinear Time

TL;DR

Abstract

Paper Structure

Table of Contents

Key Result

Figures (1)

Theorems & Definitions (108)