Cold-Start Active Correlation Clustering
Linus Aronsson, Han Wu, Morteza Haghir Chehreghani
TL;DR
This work tackles active correlation clustering under cold-start conditions where no initial pairwise similarities are available. It introduces a coverage-aware query strategy that partitions potential queries into adaptive regions derived from the current clustering and allocates queries across regions via a region-informativeness mass, controlled by a flexible matrix A. The approach is embedded in the Active CC framework and analyzed alongside entropy-based uncertainty methods, with mean-field variational updates guiding uncertainty estimates. Experiments on synthetic and real datasets demonstrate that the coverage-based method reduces selection bias and accelerates discovery of the true clustering, outperforming several baselines, especially in the cold-start regime. The method improves batch diversity and robustness, offering a scalable solution for querying in correlation clustering.
Abstract
We study active correlation clustering where pairwise similarities are not provided upfront and must be queried in a cost-efficient manner through active learning. Specifically, we focus on the cold-start scenario, where no true initial pairwise similarities are available for active learning. To address this challenge, we propose a coverage-aware method that encourages diversity early in the process. We demonstrate the effectiveness of our approach through several synthetic and real-world experiments.
