Table of Contents
Fetching ...

Rethinking The Uniformity Metric in Self-Supervised Learning

Xianghong Fang, Jian Li, Qiang Sun, Benyou Wang

TL;DR

This work identifies four principled properties for a uniformity metric in self-supervised learning and demonstrates that the widely used metric $-\\mathcal{L}_U$ fails to capture dimensional collapse and feature redundancy. It introduces a Wasserstein-distance-based uniformity loss $-\mathcal{W}_2$, justified via a Gaussian approximation to the uniform spherical distribution and yielding a closed-form with population mean and covariance. The authors prove theoretically that $-\mathcal{W}_2$ satisfies all four properties and show, through synthetic and CIFAR-10/100 experiments, that integrating $-\mathcal{W}_2$ as an auxiliary loss mitigates dimensional collapse and improves downstream accuracy across multiple SSL methods. The results suggest practical benefits for SSL representations with more uniformity while highlighting a trade-off with slight alignment changes, and they provide code at the referenced URL for reproducibility.

Abstract

Uniformity plays an important role in evaluating learned representations, providing insights into self-supervised learning. In our quest for effective uniformity metrics, we pinpoint four principled properties that such metrics should possess. Namely, an effective uniformity metric should remain invariant to instance permutations and sample replications while accurately capturing feature redundancy and dimensional collapse. Surprisingly, we find that the uniformity metric proposed by \citet{Wang2020UnderstandingCR} fails to satisfy the majority of these properties. Specifically, their metric is sensitive to sample replications, and can not account for feature redundancy and dimensional collapse correctly. To overcome these limitations, we introduce a new uniformity metric based on the Wasserstein distance, which satisfies all the aforementioned properties. Integrating this new metric in existing self-supervised learning methods effectively mitigates dimensional collapse and consistently improves their performance on downstream tasks involving CIFAR-10 and CIFAR-100 datasets. Code is available at \url{https://github.com/statsle/WassersteinSSL}.

Rethinking The Uniformity Metric in Self-Supervised Learning

TL;DR

This work identifies four principled properties for a uniformity metric in self-supervised learning and demonstrates that the widely used metric fails to capture dimensional collapse and feature redundancy. It introduces a Wasserstein-distance-based uniformity loss , justified via a Gaussian approximation to the uniform spherical distribution and yielding a closed-form with population mean and covariance. The authors prove theoretically that satisfies all four properties and show, through synthetic and CIFAR-10/100 experiments, that integrating as an auxiliary loss mitigates dimensional collapse and improves downstream accuracy across multiple SSL methods. The results suggest practical benefits for SSL representations with more uniformity while highlighting a trade-off with slight alignment changes, and they provide code at the referenced URL for reproducibility.

Abstract

Uniformity plays an important role in evaluating learned representations, providing insights into self-supervised learning. In our quest for effective uniformity metrics, we pinpoint four principled properties that such metrics should possess. Namely, an effective uniformity metric should remain invariant to instance permutations and sample replications while accurately capturing feature redundancy and dimensional collapse. Surprisingly, we find that the uniformity metric proposed by \citet{Wang2020UnderstandingCR} fails to satisfy the majority of these properties. Specifically, their metric is sensitive to sample replications, and can not account for feature redundancy and dimensional collapse correctly. To overcome these limitations, we introduce a new uniformity metric based on the Wasserstein distance, which satisfies all the aforementioned properties. Integrating this new metric in existing self-supervised learning methods effectively mitigates dimensional collapse and consistently improves their performance on downstream tasks involving CIFAR-10 and CIFAR-100 datasets. Code is available at \url{https://github.com/statsle/WassersteinSSL}.
Paper Structure (46 sections, 7 theorems, 39 equations, 17 figures, 3 tables)

This paper contains 46 sections, 7 theorems, 39 equations, 17 figures, 3 tables.

Key Result

Theorem 1

The uniformity metric $-\mathcal{L_U}$ satisfies Property pro:ipc, but violates Properties pro:icc, pro:fcc, and pro:fbc.

Figures (17)

  • Figure 1: The left figure presents constant collapse, and the right figure visualizes dimensional collapse.
  • Figure 2: The KL divergence and Wasserstein distance between $Y_i$ and $\widehat{Y}_i$ w.r.t. various dimensions.
  • Figure 3: Sensitivity to dimensional collapse degrees: $-\mathcal{W}_{2}$ is more sensitive than $-\mathcal{L_U}$.
  • Figure 4: Effectiveness of the metrics when increasing dimension $m$: $-\mathcal{L_U}$ fails to distinguish different dimensional collapse degrees for large $m$, while $-\mathcal{W}_{2}$ is always able to.
  • Figure 5: FCC analysis.
  • ...and 12 more figures

Theorems & Definitions (10)

  • Theorem 1
  • Theorem 2
  • Lemma 1
  • Theorem 3
  • Definition 1
  • Lemma 2: Kullback-Leibler divergence Lindley1959InformationTA
  • Lemma 3: Bhattacharyya Distance Bhattacharyya1943OnAM
  • Lemma 4
  • proof : Proof of Theorem \ref{['proof:the kl divergence']}
  • proof : Proof of Lemma \ref{['proof:the pdf of y_i']}