Correlation Clustering with Active Learning of Pairwise Similarities

Linus Aronsson; Morteza Haghir Chehreghani

Correlation Clustering with Active Learning of Pairwise Similarities

Linus Aronsson, Morteza Haghir Chehreghani

TL;DR

This paper develops a generic active learning framework for correlation clustering where the pairwise similarities are not given in advance and must be queried in a cost-efficient way and proposes and analyze a number of novel query strategies suited to this setting.

Abstract

Correlation clustering is a well-known unsupervised learning setting that deals with positive and negative pairwise similarities. In this paper, we study the case where the pairwise similarities are not given in advance and must be queried in a cost-efficient way. Thereby, we develop a generic active learning framework for this task that benefits from several advantages, e.g., flexibility in the type of feedback that a user/annotator can provide, adaptation to any correlation clustering algorithm and query strategy, and robustness to noise. In addition, we propose and analyze a number of novel query strategies suited to this setting. We demonstrate the effectiveness of our framework and the proposed query strategies via several experimental studies.

Correlation Clustering with Active Learning of Pairwise Similarities

TL;DR

Abstract

Paper Structure (36 sections, 6 theorems, 31 equations, 14 figures, 2 tables, 3 algorithms)

This paper contains 36 sections, 6 theorems, 31 equations, 14 figures, 2 tables, 3 algorithms.

Introduction
Active Correlation Clustering
Problem formulation
Active correlation clustering procedure
Active correlation clustering with zero noise
Correlation clustering algorithm
Query Strategies
Uncertainty and frequency
Inconsistency and the maxmin query strategy
Maxexp query strategy
Further analysis of maxmin and maxexp
Efficient implementation of maxmin and maxexp
Exploration with maxmin and maxexp.
Experiments
Experimental setup
...and 21 more sections

Key Result

Theorem 1

Given $\sigma$, let $\mathcal{T}_{\sigma} \subseteq \boldsymbol{T}$ be the set of triangles $t = (u, v, w) \in \boldsymbol{T}$ with exactly two positive edge weights and one negative edge weight. Then, the maxmin query strategy corresponds to querying the weight of the edge $\hat{e}$ selected by

Figures (14)

Figure 1: Results for different datasets with 20% noise ($\gamma = 0.2$) and random initialization of the pairwise similarities. The evaluation metric is the adjusted rand index (ARI).
Figure 2: Results for different datasets with 40% noise ($\gamma = 0.4$) and random initialization of the pairwise similarities. The evaluation metric is the adjusted rand index (ARI).
Figure 3: Results for different datasets with 20% noise ($\gamma = 0.2$) and $k$-means initialization of the pairwise similarities. The evaluation metric is the adjusted rand index (ARI).
Figure 4: Results for different datasets with 40% noise ($\gamma = 0.4$) and $k$-means initialization of the pairwise similarities. The evaluation metric is the adjusted rand index (ARI).
Figure 5: Performance of different query strategies on the synthetic dataset with varying values of the noise level $\gamma$ and batch size $B$. When varying the noise level, we fix $B = \lceil|\mathbf{E}|/1000\rceil$. When varying the batch size, we fix $\gamma = 0.2$.
...and 9 more figures

Theorems & Definitions (12)

Theorem 1
proof
Proposition 1
proof
Proposition 2
proof
Theorem 1
proof
Proposition 1
proof
...and 2 more

Correlation Clustering with Active Learning of Pairwise Similarities

TL;DR

Abstract

Correlation Clustering with Active Learning of Pairwise Similarities

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (14)

Theorems & Definitions (12)