Deep Online Probability Aggregation Clustering
Yuxuan Yan, Na Lu, Ruofan Yan
TL;DR
This work tackles instability and computational burden in deep clustering by abandoning cluster centers and introducing centerless Probability Aggregation Clustering (PAC). PAC defines a centerless objective $J_{pac}$ that uses probability vectors and pairwise distances to drive clustering, and extends to online probability aggregation (OPA) for mini-batch clustering, enabling stable online updates via KL divergence-based self-labeling. Building on PAC, the authors propose DPAC, which integrates a weighted contrastive loss with the online clustering objective, using a two-view SimCLR-like setup and a uniform pretraining phase to produce strong, scalable clustering performance. Empirical results across nine real-world and image benchmarks show that PAC is robust and scalable, and that DPAC, especially the online variant, consistently outperforms state-of-the-art online and offline deep clustering methods, while avoiding center-based collapse and heavy clustering-specific regularization.
Abstract
Combining machine clustering with deep models has shown remarkable superiority in deep clustering. It modifies the data processing pipeline into two alternating phases: feature clustering and model training. However, such alternating schedule may lead to instability and computational burden issues. We propose a centerless clustering algorithm called Probability Aggregation Clustering (PAC) to proactively adapt deep learning technologies, enabling easy deployment in online deep clustering. PAC circumvents the cluster center and aligns the probability space and distribution space by formulating clustering as an optimization problem with a novel objective function. Based on the computation mechanism of the PAC, we propose a general online probability aggregation module to perform stable and flexible feature clustering over mini-batch data and further construct a deep visual clustering framework deep PAC (DPAC). Extensive experiments demonstrate that PAC has superior clustering robustness and performance and DPAC remarkably outperforms the state-of-the-art deep clustering methods.
