Table of Contents
Fetching ...

Whitening Consistently Improves Self-Supervised Learning

András Kalapos, Bálint Gyires-Tóth

TL;DR

This work investigates how ZCA whitening, applied as the final layer of the encoder, can universally improve self-supervised learning (SSL) representations across diverse methods and architectures. By implementing whitening via differentiable Iterative Normalization and evaluating with both linear and k-NN probes, the authors demonstrate consistent performance gains and faster convergence, with BYOL showing the strongest improvements and a notable case where whitening prevents collapse. In addition to empirical results, they introduce feature-space metrics—mean absolute feature correlation, mean feature standard deviation, and anisotropy—to analyze representation quality and collapse patterns. The findings suggest whitening is a practical, broadly applicable post-encoder enhancement for SSL that yields meaningful gains with modest compute overhead, supported by public code and a framework for interpretable feature analysis.

Abstract

Self-supervised learning (SSL) has been shown to be a powerful approach for learning visual representations. In this study, we propose incorporating ZCA whitening as the final layer of the encoder in self-supervised learning to enhance the quality of learned features by normalizing and decorrelating them. Although whitening has been utilized in SSL in previous works, its potential to universally improve any SSL model has not been explored. We demonstrate that adding whitening as the last layer of SSL pretrained encoders is independent of the self-supervised learning method and encoder architecture, thus it improves performance for a wide range of SSL methods across multiple encoder architectures and datasets. Our experiments show that whitening is capable of improving linear and k-NN probing accuracy by 1-5%. Additionally, we propose metrics that allow for a comprehensive analysis of the learned features, provide insights into the quality of the representations and help identify collapse patterns.

Whitening Consistently Improves Self-Supervised Learning

TL;DR

This work investigates how ZCA whitening, applied as the final layer of the encoder, can universally improve self-supervised learning (SSL) representations across diverse methods and architectures. By implementing whitening via differentiable Iterative Normalization and evaluating with both linear and k-NN probes, the authors demonstrate consistent performance gains and faster convergence, with BYOL showing the strongest improvements and a notable case where whitening prevents collapse. In addition to empirical results, they introduce feature-space metrics—mean absolute feature correlation, mean feature standard deviation, and anisotropy—to analyze representation quality and collapse patterns. The findings suggest whitening is a practical, broadly applicable post-encoder enhancement for SSL that yields meaningful gains with modest compute overhead, supported by public code and a framework for interpretable feature analysis.

Abstract

Self-supervised learning (SSL) has been shown to be a powerful approach for learning visual representations. In this study, we propose incorporating ZCA whitening as the final layer of the encoder in self-supervised learning to enhance the quality of learned features by normalizing and decorrelating them. Although whitening has been utilized in SSL in previous works, its potential to universally improve any SSL model has not been explored. We demonstrate that adding whitening as the last layer of SSL pretrained encoders is independent of the self-supervised learning method and encoder architecture, thus it improves performance for a wide range of SSL methods across multiple encoder architectures and datasets. Our experiments show that whitening is capable of improving linear and k-NN probing accuracy by 1-5%. Additionally, we propose metrics that allow for a comprehensive analysis of the learned features, provide insights into the quality of the representations and help identify collapse patterns.
Paper Structure (21 sections, 4 equations, 5 figures, 2 tables)

This paper contains 21 sections, 4 equations, 5 figures, 2 tables.

Figures (5)

  • Figure 1: Overview of our method for decorrelating $h_i$ features via ZCA whitening, denoted by $\mathcal{W}$. $\widehat{h_i}$ represent whitened features. We apply feature whitening on both branches during pretraining using all self-supervised learning methods that we investigate. Whitening is also applied during evaluation.
  • Figure 2: Linear probe results for all self-supervised learning methods and encoders with and without whitening. $\mathcal{W}$ indicates whitening as the last layer of the encoder. Arrows indicate where whitening provides a notable improvement in accuracy. (For ResNet18 the markers show mean accuracy over 5 runs.)
  • Figure 3: Probe accuracy gains and feature anisotropy change provided by the addition of whitening to a ResNet18 encoder with 5 different random seeds. Dots show mean accuracy gain over 5 runs, error bars show the 95% confidence intervals. Positive probe values indicate an improvement in accuracy, negative values indicate a decrease in accuracy. Whitening lowers the anisotropy of features, indicating more decorrelated features.
  • Figure 4: Learning curves for pretraining a ResNet-18 encoder with and without whitening. The curves show the mean linear probe accuracy over 5 runs, while shaded bands correspond to its standard deviation.
  • Figure 5: Scatter plots of feature metrics against linear probe accuracy on the Tiny-ImageNet dataset for all self-supervised learning methods and encoders, with and without whitening.