The Hidden Uniform Cluster Prior in Self-Supervised Learning
Mahmoud Assran, Randall Balestriero, Quentin Duval, Florian Bordes, Ishan Misra, Piotr Bojanowski, Pascal Vincent, Michael Rabbat, Nicolas Ballas
TL;DR
The paper shows that common self-supervised joint-embedding losses enforce a uniform cluster prior, which harms learning on class-imbalanced data. It formalizes this as a K-means-like implicit/explicit clustering bias and demonstrates the negative impact via extensive experiments, including prototype visualizations. To address this, it introduces Prior Matching for Siamese Networks (PMSN), extending MSN to arbitrary priors (notably power-law), and demonstrates improved semantic transfer on long-tailed datasets like iNaturalist when priors are matched to data distribution. The work also provides both toy and real-data analyses, including visualizations of learned prototypes, to illustrate how prior choice shapes the semantic content of learned representations.
Abstract
A successful paradigm in representation learning is to perform self-supervised pretraining using tasks based on mini-batch statistics (e.g., SimCLR, VICReg, SwAV, MSN). We show that in the formulation of all these methods is an overlooked prior to learn features that enable uniform clustering of the data. While this prior has led to remarkably semantic representations when pretraining on class-balanced data, such as ImageNet, we demonstrate that it can hamper performance when pretraining on class-imbalanced data. By moving away from conventional uniformity priors and instead preferring power-law distributed feature clusters, we show that one can improve the quality of the learned representations on real-world class-imbalanced datasets. To demonstrate this, we develop an extension of the Masked Siamese Networks (MSN) method to support the use of arbitrary features priors.
