Table of Contents
Fetching ...

Cold-Start Active Correlation Clustering

Linus Aronsson, Han Wu, Morteza Haghir Chehreghani

TL;DR

This work tackles active correlation clustering under cold-start conditions where no initial pairwise similarities are available. It introduces a coverage-aware query strategy that partitions potential queries into adaptive regions derived from the current clustering and allocates queries across regions via a region-informativeness mass, controlled by a flexible matrix A. The approach is embedded in the Active CC framework and analyzed alongside entropy-based uncertainty methods, with mean-field variational updates guiding uncertainty estimates. Experiments on synthetic and real datasets demonstrate that the coverage-based method reduces selection bias and accelerates discovery of the true clustering, outperforming several baselines, especially in the cold-start regime. The method improves batch diversity and robustness, offering a scalable solution for querying in correlation clustering.

Abstract

We study active correlation clustering where pairwise similarities are not provided upfront and must be queried in a cost-efficient manner through active learning. Specifically, we focus on the cold-start scenario, where no true initial pairwise similarities are available for active learning. To address this challenge, we propose a coverage-aware method that encourages diversity early in the process. We demonstrate the effectiveness of our approach through several synthetic and real-world experiments.

Cold-Start Active Correlation Clustering

TL;DR

This work tackles active correlation clustering under cold-start conditions where no initial pairwise similarities are available. It introduces a coverage-aware query strategy that partitions potential queries into adaptive regions derived from the current clustering and allocates queries across regions via a region-informativeness mass, controlled by a flexible matrix A. The approach is embedded in the Active CC framework and analyzed alongside entropy-based uncertainty methods, with mean-field variational updates guiding uncertainty estimates. Experiments on synthetic and real datasets demonstrate that the coverage-based method reduces selection bias and accelerates discovery of the true clustering, outperforming several baselines, especially in the cold-start regime. The method improves batch diversity and robustness, offering a scalable solution for querying in correlation clustering.

Abstract

We study active correlation clustering where pairwise similarities are not provided upfront and must be queried in a cost-efficient manner through active learning. Specifically, we focus on the cold-start scenario, where no true initial pairwise similarities are available for active learning. To address this challenge, we propose a coverage-aware method that encourages diversity early in the process. We demonstrate the effectiveness of our approach through several synthetic and real-world experiments.

Paper Structure

This paper contains 15 sections, 3 equations, 3 figures, 1 algorithm.

Figures (3)

  • Figure 1: Ablation studies on the synthetic dataset. See Section \ref{['section:experiments']} for a detailed description.
  • Figure 2: Comparison of diverse methods on synthetic dataset.
  • Figure 3: Results for different methods across datasets.