Table of Contents
Fetching ...

Variance Covariance Regularization Enforces Pairwise Independence in Self-Supervised Representations

Grégoire Mialon, Randall Balestriero, Yann LeCun

TL;DR

This paper demonstrates that Variance-Covariance Regularization (VCReg) applied to the SSL projector enforces pairwise independence among encoder features by connecting projector-output decorrelation to kernel-based independence criteria. It provides theoretical arguments and empirical evidence that wider, potentially random, MLP projectors yield stronger pairwise independence and that this property correlates with better out-of-domain generalization. The authors extend the VCReg viewpoint to related methods like BarlowTwins and W-MSE, and show that VCReg can solve linear ICA with random projectors but not nonlinear ICA, offering a new lens on the role of the projector in SSL and signaling potential new applications beyond SSL. Overall, the work supplies a theoretical foundation for the use of MLP projectors in SSL, proposes HSIC as a candidate model-selection metric, and suggests broader impacts in representation learning and ICA.

Abstract

Self-Supervised Learning (SSL) methods such as VICReg, Barlow Twins or W-MSE avoid collapse of their joint embedding architectures by constraining or regularizing the covariance matrix of their projector's output. This study highlights important properties of such strategy, which we coin Variance-Covariance regularization (VCReg). More precisely, we show that {\em VCReg combined to a MLP projector enforces pairwise independence between the features of the learned representation}. This result emerges by bridging VCReg applied on the projector's output to kernel independence criteria applied on the projector's input. We empirically validate our findings where (i) we put in evidence which projector's characteristics favor pairwise independence, (ii) we demonstrate pairwise independence to be beneficial for out-of-domain generalization, (iii) we demonstrate that the scope of VCReg goes beyond SSL by using it to solve Independent Component Analysis. This provides the first theoretical motivation and explanation of MLP projectors in SSL.

Variance Covariance Regularization Enforces Pairwise Independence in Self-Supervised Representations

TL;DR

This paper demonstrates that Variance-Covariance Regularization (VCReg) applied to the SSL projector enforces pairwise independence among encoder features by connecting projector-output decorrelation to kernel-based independence criteria. It provides theoretical arguments and empirical evidence that wider, potentially random, MLP projectors yield stronger pairwise independence and that this property correlates with better out-of-domain generalization. The authors extend the VCReg viewpoint to related methods like BarlowTwins and W-MSE, and show that VCReg can solve linear ICA with random projectors but not nonlinear ICA, offering a new lens on the role of the projector in SSL and signaling potential new applications beyond SSL. Overall, the work supplies a theoretical foundation for the use of MLP projectors in SSL, proposes HSIC as a candidate model-selection metric, and suggests broader impacts in representation learning and ICA.

Abstract

Self-Supervised Learning (SSL) methods such as VICReg, Barlow Twins or W-MSE avoid collapse of their joint embedding architectures by constraining or regularizing the covariance matrix of their projector's output. This study highlights important properties of such strategy, which we coin Variance-Covariance regularization (VCReg). More precisely, we show that {\em VCReg combined to a MLP projector enforces pairwise independence between the features of the learned representation}. This result emerges by bridging VCReg applied on the projector's output to kernel independence criteria applied on the projector's input. We empirically validate our findings where (i) we put in evidence which projector's characteristics favor pairwise independence, (ii) we demonstrate pairwise independence to be beneficial for out-of-domain generalization, (iii) we demonstrate that the scope of VCReg goes beyond SSL by using it to solve Independent Component Analysis. This provides the first theoretical motivation and explanation of MLP projectors in SSL.
Paper Structure (53 sections, 4 theorems, 19 equations, 13 figures, 4 tables)

This paper contains 53 sections, 4 theorems, 19 equations, 13 figures, 4 tables.

Key Result

Theorem 2.1

$\mathrm{HSIC}(X_1,X_2)=0$ if and only if $X_1$ and $X_2$ are independent.

Figures (13)

  • Figure 1: Pairwise independence (HSIC) of the features in the learned representation increases during training while mutual independence (dHSIC) stagnates: VCReg of the projector implicitly optimizes the former but not the latter. Averaged over three runs.
  • Figure 2: Normalized HSIC (computed on ImageNet validation) for various representations learned with different projectors width and setup: learned, random, random and resampled at each optimization step (3 runs each). Wider projectors and resampling both yield more pairwise independence in the learned representation.
  • Figure 3: Normalized HSIC (computed on ImageNet validation) of representations correlates with downstream accuracy both in domain and out of domain. To each HSIC level corresponds a representation. For each dataset, the accuracies were rescaled with respect to their maximum for better readability.
  • Figure 4: Linear ICA model expressed from VCReg of the projector's output.
  • Figure 5: Resampling and increased width of the projector $g$ both improve the quality of the source reconstruction measured by the maximum correlation between true and reconstructed sources ($\uparrow$).
  • ...and 8 more figures

Theorems & Definitions (8)

  • Theorem 2.1: Thm. 4 from gretton2005measuring
  • Lemma 3.1: Nonlinear elementwise projectors minimize HSIC of their input
  • proof
  • Remark 3.2: Necessity of variance regularization
  • Lemma 3.3: Random linear projectors minimize HSIC of their input
  • Theorem 3.4: MLP projectors with random weights enforce pairwise independence.
  • proof
  • proof