Table of Contents
Fetching ...

Beyond Pairwise Correlations: Higher-Order Redundancies in Self-Supervised Representation Learning

David Zollikofer, Béni Egressy, Frederik Benzing, Matthias Otth, Roger Wattenhofer

TL;DR

The paper argues that current self-supervised learning (SSL) largely addresses pairwise redundancy and may overlook higher-order dependencies in embedding spaces. It introduces a formal redundancy framework with measures AAC, LR, and NLR, establishes theoretical relations among them, and presents Self-Supervised Learning with Predictability Minimization (SSLPM), a predictor-based, redundancy-minimizing SSL method. Through extensive experiments across CIFAR-10/100 and ImageNet-100, it shows that reducing linear redundancy (LR) correlates with better downstream performance and that reducing higher-order redundancies yields mixed or negative effects, with SSLPM-RR delivering competitive performance to state-of-the-art baselines. The findings highlight the projector’s role in pruning redundancy and provide a framework for analyzing and guiding SSL design, suggesting potential extensions to other modalities and redundancy metrics.

Abstract

Several self-supervised learning (SSL) approaches have shown that redundancy reduction in the feature embedding space is an effective tool for representation learning. However, these methods consider a narrow notion of redundancy, focusing on pairwise correlations between features. To address this limitation, we formalize the notion of embedding space redundancy and introduce redundancy measures that capture more complex, higher-order dependencies. We mathematically analyze the relationships between these metrics, and empirically measure these redundancies in the embedding spaces of common SSL methods. Based on our findings, we propose Self Supervised Learning with Predictability Minimization (SSLPM) as a method for reducing redundancy in the embedding space. SSLPM combines an encoder network with a predictor engaging in a competitive game of reducing and exploiting dependencies respectively. We demonstrate that SSLPM is competitive with state-of-the-art methods and find that the best performing SSL methods exhibit low embedding space redundancy, suggesting that even methods without explicit redundancy reduction mechanisms perform redundancy reduction implicitly.

Beyond Pairwise Correlations: Higher-Order Redundancies in Self-Supervised Representation Learning

TL;DR

The paper argues that current self-supervised learning (SSL) largely addresses pairwise redundancy and may overlook higher-order dependencies in embedding spaces. It introduces a formal redundancy framework with measures AAC, LR, and NLR, establishes theoretical relations among them, and presents Self-Supervised Learning with Predictability Minimization (SSLPM), a predictor-based, redundancy-minimizing SSL method. Through extensive experiments across CIFAR-10/100 and ImageNet-100, it shows that reducing linear redundancy (LR) correlates with better downstream performance and that reducing higher-order redundancies yields mixed or negative effects, with SSLPM-RR delivering competitive performance to state-of-the-art baselines. The findings highlight the projector’s role in pruning redundancy and provide a framework for analyzing and guiding SSL design, suggesting potential extensions to other modalities and redundancy metrics.

Abstract

Several self-supervised learning (SSL) approaches have shown that redundancy reduction in the feature embedding space is an effective tool for representation learning. However, these methods consider a narrow notion of redundancy, focusing on pairwise correlations between features. To address this limitation, we formalize the notion of embedding space redundancy and introduce redundancy measures that capture more complex, higher-order dependencies. We mathematically analyze the relationships between these metrics, and empirically measure these redundancies in the embedding spaces of common SSL methods. Based on our findings, we propose Self Supervised Learning with Predictability Minimization (SSLPM) as a method for reducing redundancy in the embedding space. SSLPM combines an encoder network with a predictor engaging in a competitive game of reducing and exploiting dependencies respectively. We demonstrate that SSLPM is competitive with state-of-the-art methods and find that the best performing SSL methods exhibit low embedding space redundancy, suggesting that even methods without explicit redundancy reduction mechanisms perform redundancy reduction implicitly.

Paper Structure

This paper contains 33 sections, 27 equations, 16 figures, 6 tables.

Figures (16)

  • Figure 1: Siamese training set up with encoder in self supervised learning. Embeddings are used in downstream applications, whereas representations are used for loss calculation during pretraining. After pretraining, during evaluation, only the encoder is kept and the randomized augmentations $\tau$ are not applied.
  • Figure 2: Schematic representation of the SSLPM model with the two actors, the encoder-projector network (blue) and the predictor network (yellow). Our contribution is indicated with the dashed box.
  • Figure 3: Impact of $\lambda$ in Equation \ref{['eq:clLoss']} on CIFAR-10.
  • Figure 4: Relationship between Top-1 accuracy and three different redundancy measures on CIFAR-10 plotted with the lines of best fit. For the best fit lines, the Pearson correlation and the $p$-value for the null hypothesis that the data is uncorrelated is reported. LR is the only redundancy measure where the correlation is significant (with $p<0.01$) for both models.
  • Figure 5: Ablation on the number of layers of the predictor in SSLPM-SGD compared with SSLPM-RR. All results are calculated on CIFAR-10.
  • ...and 11 more figures

Theorems & Definitions (2)

  • Definition 3.1: Average Absolute Covariance
  • Definition 3.2: Predictability