SiamMM: A Mixture Model Perspective on Deep Unsupervised Learning
Xiaodong Wang, Jing Huang, Kevin J Liang
TL;DR
SiamMM reframes clustering-based self-supervised learning as a mixture-model problem and introduces a two-tier EM framework that jointly learns cluster parameters and representations without relying on negative samples. It employs a non-negative soft-assignment loss, a dynamic cluster-merging strategy, and consistent centroid updates to efficiently discover semantically meaningful centroids that align with unseen labels. The approach achieves state-of-the-art performance on ImageNet SSL benchmarks, with strong linear and transfer results and insightful clustering visualizations. This mixture-model perspective offers a principled, scalable alternative to contrastive methods and sheds light on cluster structure and data labeling quality in large-scale vision data.
Abstract
Recent studies have demonstrated the effectiveness of clustering-based approaches for self-supervised and unsupervised learning. However, the application of clustering is often heuristic, and the optimal methodology remains unclear. In this work, we establish connections between these unsupervised clustering methods and classical mixture models from statistics. Through this framework, we demonstrate significant enhancements to these clustering methods, leading to the development of a novel model named SiamMM. Our method attains state-of-the-art performance across various self-supervised learning benchmarks. Inspection of the learned clusters reveals a strong resemblance to unseen ground truth labels, uncovering potential instances of mislabeling.
