Variance-Covariance Regularization Improves Representation Learning
Jiachen Zhu, Katrina Evtimova, Yubei Chen, Ravid Shwartz-Ziv, Yann LeCun
TL;DR
VCReg addresses the tendency of supervised pretraining to overfit to the source task by regularizing representations to be high-variance and low-covariance. By extending a VICReg-inspired objective to supervised settings and applying it across intermediate layers, VCReg promotes diverse, transferable features with efficient backward-gradient updates. Empirical results across images and videos demonstrate state-of-the-art transfer performance and gains on long-tail and hierarchical tasks, with analyses showing reduced gradient starvation and neural collapse while maintaining information content and noise robustness. Overall, VCReg provides a practical, architecture-agnostic framework that strengthens feature transfer and broadens the applicability of supervised pretraining.
Abstract
Transfer learning plays a key role in advancing machine learning models, yet conventional supervised pretraining often undermines feature transferability by prioritizing features that minimize the pretraining loss. In this work, we adapt a self-supervised learning regularization technique from the VICReg method to supervised learning contexts, introducing Variance-Covariance Regularization (VCReg). This adaptation encourages the network to learn high-variance, low-covariance representations, promoting learning more diverse features. We outline best practices for an efficient implementation of our framework, including applying it to the intermediate representations. Through extensive empirical evaluation, we demonstrate that our method significantly enhances transfer learning for images and videos, achieving state-of-the-art performance across numerous tasks and datasets. VCReg also improves performance in scenarios like long-tail learning and hierarchical classification. Additionally, we show its effectiveness may stem from its success in addressing challenges like gradient starvation and neural collapse. In summary, VCReg offers a universally applicable regularization framework that significantly advances transfer learning and highlights the connection between gradient starvation, neural collapse, and feature transferability.
