Variance Covariance Regularization Enforces Pairwise Independence in Self-Supervised Representations

Grégoire Mialon; Randall Balestriero; Yann LeCun

Variance Covariance Regularization Enforces Pairwise Independence in Self-Supervised Representations

Grégoire Mialon, Randall Balestriero, Yann LeCun

TL;DR

This paper demonstrates that Variance-Covariance Regularization (VCReg) applied to the SSL projector enforces pairwise independence among encoder features by connecting projector-output decorrelation to kernel-based independence criteria. It provides theoretical arguments and empirical evidence that wider, potentially random, MLP projectors yield stronger pairwise independence and that this property correlates with better out-of-domain generalization. The authors extend the VCReg viewpoint to related methods like BarlowTwins and W-MSE, and show that VCReg can solve linear ICA with random projectors but not nonlinear ICA, offering a new lens on the role of the projector in SSL and signaling potential new applications beyond SSL. Overall, the work supplies a theoretical foundation for the use of MLP projectors in SSL, proposes HSIC as a candidate model-selection metric, and suggests broader impacts in representation learning and ICA.

Abstract

Self-Supervised Learning (SSL) methods such as VICReg, Barlow Twins or W-MSE avoid collapse of their joint embedding architectures by constraining or regularizing the covariance matrix of their projector's output. This study highlights important properties of such strategy, which we coin Variance-Covariance regularization (VCReg). More precisely, we show that {\em VCReg combined to a MLP projector enforces pairwise independence between the features of the learned representation}. This result emerges by bridging VCReg applied on the projector's output to kernel independence criteria applied on the projector's input. We empirically validate our findings where (i) we put in evidence which projector's characteristics favor pairwise independence, (ii) we demonstrate pairwise independence to be beneficial for out-of-domain generalization, (iii) we demonstrate that the scope of VCReg goes beyond SSL by using it to solve Independent Component Analysis. This provides the first theoretical motivation and explanation of MLP projectors in SSL.

Variance Covariance Regularization Enforces Pairwise Independence in Self-Supervised Representations

TL;DR

Abstract

Paper Structure (53 sections, 4 theorems, 19 equations, 13 figures, 4 tables)

This paper contains 53 sections, 4 theorems, 19 equations, 13 figures, 4 tables.

Introduction
Background
Measuring Statistical Independence Using Kernel Methods
HSIC and pairwise independence.
dHSIC and mutual independence.
Complexities.
Variance-Covariance Regularization in Self-Supervised Learning
VICReg.
VC Regularization of SSL Projector's Output Enforces Pairwise Independent Features at the Encoder's Output
Notations.
Extension to BarlowTwins and W-MSE, and generality of VCReg.
Learned projectors.
Related Work
Feature decorrelation,
Independence criterion for learning features.
...and 38 more sections

Key Result

Theorem 2.1

$\mathrm{HSIC}(X_1,X_2)=0$ if and only if $X_1$ and $X_2$ are independent.

Figures (13)

Figure 1: Pairwise independence (HSIC) of the features in the learned representation increases during training while mutual independence (dHSIC) stagnates: VCReg of the projector implicitly optimizes the former but not the latter. Averaged over three runs.
Figure 2: Normalized HSIC (computed on ImageNet validation) for various representations learned with different projectors width and setup: learned, random, random and resampled at each optimization step (3 runs each). Wider projectors and resampling both yield more pairwise independence in the learned representation.
Figure 3: Normalized HSIC (computed on ImageNet validation) of representations correlates with downstream accuracy both in domain and out of domain. To each HSIC level corresponds a representation. For each dataset, the accuracies were rescaled with respect to their maximum for better readability.
Figure 4: Linear ICA model expressed from VCReg of the projector's output.
Figure 5: Resampling and increased width of the projector $g$ both improve the quality of the source reconstruction measured by the maximum correlation between true and reconstructed sources ($\uparrow$).
...and 8 more figures

Theorems & Definitions (8)

Theorem 2.1: Thm. 4 from gretton2005measuring
Lemma 3.1: Nonlinear elementwise projectors minimize HSIC of their input
proof
Remark 3.2: Necessity of variance regularization
Lemma 3.3: Random linear projectors minimize HSIC of their input
Theorem 3.4: MLP projectors with random weights enforce pairwise independence.
proof
proof

Variance Covariance Regularization Enforces Pairwise Independence in Self-Supervised Representations

TL;DR

Abstract

Variance Covariance Regularization Enforces Pairwise Independence in Self-Supervised Representations

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (13)

Theorems & Definitions (8)