Table of Contents
Fetching ...

Direct Coloring for Self-Supervised Enhanced Feature Decoupling

Salman Mohamadi, Gianfranco Doretto, Donald A. Adjeroh

TL;DR

This work addresses dimensional collapse in self-supervised learning by introducing direct coloring to actively shape feature correlations, complementing whitening. The method uses a Bayesian-inspired target cross-correlation matrix $E$, derived from variational autoencoders trained on augmented views, and optimizes $\,\mathcal{L} = \mathcal{L}_{W} + \lambda \mathcal{L}_{C}$ to align the coloring cross-correlations with $E$ while whitening decorrelates subsequent embeddings. Theoretical MAP analysis links the objective to Bayesian estimation with a Gaussian prior, and empirically the approach yields faster convergence and improved accuracy across ImageNet, CIFAR-10/100, Tiny ImageNet, and transfer tasks like VOC0712 and COCO, with ablations clarifying the impact of coloring head placement, projector size, and choice of target $E$. The results demonstrate that direct coloring is a practical, broadly applicable enhancement to SSL, reducing the risk of complete collapse and enabling stronger, more transferable representations.

Abstract

The success of self-supervised learning (SSL) has been the focus of multiple recent theoretical and empirical studies, including the role of data augmentation (in feature decoupling) as well as complete and dimensional representation collapse. While complete collapse is well-studied and addressed, dimensional collapse has only gain attention and addressed in recent years mostly using variants of redundancy reduction (aka whitening) techniques. In this paper, we further explore a complementary approach to whitening via feature decoupling for improved representation learning while avoiding representation collapse. In particular, we perform feature decoupling by early promotion of useful features via careful feature coloring. The coloring technique is developed based on a Bayesian prior of the augmented data, which is inherently encoded for feature decoupling. We show that our proposed framework is complementary to the state-of-the-art techniques, while outperforming both contrastive and recent non-contrastive methods. We also study the different effects of coloring approach to formulate it as a general complementary technique along with other baselines.

Direct Coloring for Self-Supervised Enhanced Feature Decoupling

TL;DR

This work addresses dimensional collapse in self-supervised learning by introducing direct coloring to actively shape feature correlations, complementing whitening. The method uses a Bayesian-inspired target cross-correlation matrix , derived from variational autoencoders trained on augmented views, and optimizes to align the coloring cross-correlations with while whitening decorrelates subsequent embeddings. Theoretical MAP analysis links the objective to Bayesian estimation with a Gaussian prior, and empirically the approach yields faster convergence and improved accuracy across ImageNet, CIFAR-10/100, Tiny ImageNet, and transfer tasks like VOC0712 and COCO, with ablations clarifying the impact of coloring head placement, projector size, and choice of target . The results demonstrate that direct coloring is a practical, broadly applicable enhancement to SSL, reducing the risk of complete collapse and enabling stronger, more transferable representations.

Abstract

The success of self-supervised learning (SSL) has been the focus of multiple recent theoretical and empirical studies, including the role of data augmentation (in feature decoupling) as well as complete and dimensional representation collapse. While complete collapse is well-studied and addressed, dimensional collapse has only gain attention and addressed in recent years mostly using variants of redundancy reduction (aka whitening) techniques. In this paper, we further explore a complementary approach to whitening via feature decoupling for improved representation learning while avoiding representation collapse. In particular, we perform feature decoupling by early promotion of useful features via careful feature coloring. The coloring technique is developed based on a Bayesian prior of the augmented data, which is inherently encoded for feature decoupling. We show that our proposed framework is complementary to the state-of-the-art techniques, while outperforming both contrastive and recent non-contrastive methods. We also study the different effects of coloring approach to formulate it as a general complementary technique along with other baselines.

Paper Structure

This paper contains 31 sections, 14 equations, 3 figures, 3 tables.

Figures (3)

  • Figure 1: Left:Schematic diagram of the proposed framework. For a given sample, two augmented views are generated and fed to the symmetric networks. The two pairs of projectors are used to perform cascade coloring and whitening, respectively. Right: Desired cross-correlation for direct coloring; $E$ is a squared matrix with the same size as the latent space of each of the VAEs.
  • Figure 2: Simpler architecture with auto-correlation instead of cross-correlation.
  • Figure 3: Sensitivity to $\lambda$.