Table of Contents
Fetching ...

Correlation Clustering with Active Learning of Pairwise Similarities

Linus Aronsson, Morteza Haghir Chehreghani

TL;DR

This paper develops a generic active learning framework for correlation clustering where the pairwise similarities are not given in advance and must be queried in a cost-efficient way and proposes and analyze a number of novel query strategies suited to this setting.

Abstract

Correlation clustering is a well-known unsupervised learning setting that deals with positive and negative pairwise similarities. In this paper, we study the case where the pairwise similarities are not given in advance and must be queried in a cost-efficient way. Thereby, we develop a generic active learning framework for this task that benefits from several advantages, e.g., flexibility in the type of feedback that a user/annotator can provide, adaptation to any correlation clustering algorithm and query strategy, and robustness to noise. In addition, we propose and analyze a number of novel query strategies suited to this setting. We demonstrate the effectiveness of our framework and the proposed query strategies via several experimental studies.

Correlation Clustering with Active Learning of Pairwise Similarities

TL;DR

This paper develops a generic active learning framework for correlation clustering where the pairwise similarities are not given in advance and must be queried in a cost-efficient way and proposes and analyze a number of novel query strategies suited to this setting.

Abstract

Correlation clustering is a well-known unsupervised learning setting that deals with positive and negative pairwise similarities. In this paper, we study the case where the pairwise similarities are not given in advance and must be queried in a cost-efficient way. Thereby, we develop a generic active learning framework for this task that benefits from several advantages, e.g., flexibility in the type of feedback that a user/annotator can provide, adaptation to any correlation clustering algorithm and query strategy, and robustness to noise. In addition, we propose and analyze a number of novel query strategies suited to this setting. We demonstrate the effectiveness of our framework and the proposed query strategies via several experimental studies.
Paper Structure (36 sections, 6 theorems, 31 equations, 14 figures, 2 tables, 3 algorithms)

This paper contains 36 sections, 6 theorems, 31 equations, 14 figures, 2 tables, 3 algorithms.

Key Result

Theorem 1

Given $\sigma$, let $\mathcal{T}_{\sigma} \subseteq \boldsymbol{T}$ be the set of triangles $t = (u, v, w) \in \boldsymbol{T}$ with exactly two positive edge weights and one negative edge weight. Then, the maxmin query strategy corresponds to querying the weight of the edge $\hat{e}$ selected by

Figures (14)

  • Figure 1: Results for different datasets with 20% noise ($\gamma = 0.2$) and random initialization of the pairwise similarities. The evaluation metric is the adjusted rand index (ARI).
  • Figure 2: Results for different datasets with 40% noise ($\gamma = 0.4$) and random initialization of the pairwise similarities. The evaluation metric is the adjusted rand index (ARI).
  • Figure 3: Results for different datasets with 20% noise ($\gamma = 0.2$) and $k$-means initialization of the pairwise similarities. The evaluation metric is the adjusted rand index (ARI).
  • Figure 4: Results for different datasets with 40% noise ($\gamma = 0.4$) and $k$-means initialization of the pairwise similarities. The evaluation metric is the adjusted rand index (ARI).
  • Figure 5: Performance of different query strategies on the synthetic dataset with varying values of the noise level $\gamma$ and batch size $B$. When varying the noise level, we fix $B = \lceil|\mathbf{E}|/1000\rceil$. When varying the batch size, we fix $\gamma = 0.2$.
  • ...and 9 more figures

Theorems & Definitions (12)

  • Theorem 1
  • proof
  • Proposition 1
  • proof
  • Proposition 2
  • proof
  • Theorem 1
  • proof
  • Proposition 1
  • proof
  • ...and 2 more