Pivot based correlation clustering in the presence of good clusters

David Rasmussen Lolck; Mikkel Thorup; Shuyi Yan

Pivot based correlation clustering in the presence of good clusters

David Rasmussen Lolck, Mikkel Thorup, Shuyi Yan

Abstract

The classic pivot based clustering algorithm of Ailon, Charikar and Chawla [JACM'08] is factor 3, but all concrete examples showing that it is no better than 3 are based on some very good clusters, e.g., a complete graph minus a matching. By removing all good clusters before we make each pivot step, we show that this improves the approximation ratio to $2.9991$. To aid in this, we also show how our proposed algorithm performs on synthetic datasets, where the algorithm performs remarkably well, and shows improvements over both the algorithm for locating good clusters and the classic pivot algorithm.

Pivot based correlation clustering in the presence of good clusters

Abstract

. To aid in this, we also show how our proposed algorithm performs on synthetic datasets, where the algorithm performs remarkably well, and shows improvements over both the algorithm for locating good clusters and the classic pivot algorithm.

Paper Structure (22 sections, 29 theorems, 89 equations, 2 figures, 4 algorithms)

This paper contains 22 sections, 29 theorems, 89 equations, 2 figures, 4 algorithms.

Introduction
Prior work
Technical Overview
Structure of the paper
Preliminaries
The pivot in the absence of good clusters
Locating very good clusters
Pivots with good clusters
Experiments
Missing Proofs in \ref{['sec:locate-good-cluster']}
Proof of \ref{['lem:hit-good-cluster']}
Proof of \ref{['lem:check-false']}
Proof of \ref{['lem:check-true']}
Proof of \ref{['lem:very-good']}
Proof of \ref{['lem:time-atom-finding']}
...and 7 more sections

Key Result

Theorem 1

alg:atom-pivot is a $2.9991$ approximation in time $O(m\log n)$.

Figures (2)

Figure 1: Decrease of the cost of the optimal solution $\mathrm{opt}$, an amortisation $g$ of this cost taking changing the optimal clustering into account, and the cost of the pivot algorithm $\mathrm{cost}$ all up to symmetries in the labelling of triangles.
Figure 2: Performance of the algorithms as a function of the noise parameter $\varepsilon$. We generate graphs with $n=10^3$ vertices and a planted partition into $k=10$ clusters. Edges are added between vertices in the same cluster and then independently flipped with probability $\varepsilon$. The y-axis (cost) shows the total number of disagreements in the resulting clustering. For visualization, the plotted curves are smoothed by averaging in log-space over a sliding window of $11$ points: $l_i = \exp\!\left(\frac{1}{11}\sum_{r=i-5}^{i+5} \ln c_r\right)$, where $c_r$ is the observed cost at experiment $r$. Each point corresponds to one of $200$ different values of $\varepsilon$.

Theorems & Definitions (54)

Theorem 1
Theorem 1
Theorem 1
Theorem 1
proof : Proof of \ref{['thm:alg-approx-ratio']}
Definition 2
Definition 3
Lemma 4: CMSY15
Lemma 5: ACN08
Lemma 6
...and 44 more

Pivot based correlation clustering in the presence of good clusters

Abstract

Pivot based correlation clustering in the presence of good clusters

Authors

Abstract

Table of Contents

Key Result

Figures (2)

Theorems & Definitions (54)