Table of Contents
Fetching ...

Understanding Self-supervised Contrastive Learning through Supervised Objectives

Byeongchan Lee

TL;DR

This work reframes self-supervised representation learning as an approximation to supervised objectives by modeling class prototypes and their targets. It derives an InfoNCE-like self-supervised loss from a prototype-based supervised formulation, introduces the concept of prototype representation bias, and proposes a balanced contrastive loss that jointly tunes attracting and repelling forces via parameters $( u,eta)$ and $(oldsymbol{ ho})$. The theory connects standard SSL components (e.g., SimCLR, Siamese architectures, cosine normalization) to principled objectives, and provides empirical evidence that bias reduction and proper balancing improve downstream accuracy. The results offer a principled lens for understanding and improving SSL design, with practical implications for data augmentation, class balance, and loss formulation.

Abstract

Self-supervised representation learning has achieved impressive empirical success, yet its theoretical understanding remains limited. In this work, we provide a theoretical perspective by formulating self-supervised representation learning as an approximation to supervised representation learning objectives. Based on this formulation, we derive a loss function closely related to popular contrastive losses such as InfoNCE, offering insight into their underlying principles. Our derivation naturally introduces the concepts of prototype representation bias and a balanced contrastive loss, which help explain and improve the behavior of self-supervised learning algorithms. We further show how components of our theoretical framework correspond to established practices in contrastive learning. Finally, we empirically validate the effect of balancing positive and negative pair interactions. All theoretical proofs are provided in the appendix, and our code is included in the supplementary material.

Understanding Self-supervised Contrastive Learning through Supervised Objectives

TL;DR

This work reframes self-supervised representation learning as an approximation to supervised objectives by modeling class prototypes and their targets. It derives an InfoNCE-like self-supervised loss from a prototype-based supervised formulation, introduces the concept of prototype representation bias, and proposes a balanced contrastive loss that jointly tunes attracting and repelling forces via parameters and . The theory connects standard SSL components (e.g., SimCLR, Siamese architectures, cosine normalization) to principled objectives, and provides empirical evidence that bias reduction and proper balancing improve downstream accuracy. The results offer a principled lens for understanding and improving SSL design, with practical implications for data augmentation, class balance, and loss formulation.

Abstract

Self-supervised representation learning has achieved impressive empirical success, yet its theoretical understanding remains limited. In this work, we provide a theoretical perspective by formulating self-supervised representation learning as an approximation to supervised representation learning objectives. Based on this formulation, we derive a loss function closely related to popular contrastive losses such as InfoNCE, offering insight into their underlying principles. Our derivation naturally introduces the concepts of prototype representation bias and a balanced contrastive loss, which help explain and improve the behavior of self-supervised learning algorithms. We further show how components of our theoretical framework correspond to established practices in contrastive learning. Finally, we empirically validate the effect of balancing positive and negative pair interactions. All theoretical proofs are provided in the appendix, and our code is included in the supplementary material.

Paper Structure

This paper contains 46 sections, 7 theorems, 31 equations, 6 figures, 5 tables.

Key Result

Theorem 4.1

Assume Assumption assump:cosine_similarity, assump:l2_normalization, and assump:technical_assumption hold. Then,

Figures (6)

  • Figure 1: Supervised learning as an optimization. The loss $l_{\text{attract}}(\theta)$ encourages the image representation to attract the prototype representation $\mu_{\text{dog}}$ that shares the visual concept of that image. On the other hand, the loss $l_{\text{repel}}(\theta)$ prompts the image representation to repel the prototype representation $\mu_{\text{cat}}$ that is closest among those not sharing the visual concept of that image. The parameter $\lambda$ balances the two losses.
  • Figure 2: Self-supervised learning as an approximation of supervised learning. (1) In an ideal supervised regime, the ideal prototype representation $\mu_{y}$ is given by an oracle. (2) In a realistic supervised regime, the prototype representation is constructed as the expectation $\mathbb{E}_{T, X \vert y}f_{\theta}(T(X))$ of the representations of the images with the same label $y$. (3) In a self-supervised regime, a surrogate prototype representation is constructed as the expectation $\mathbb{E}_{T}f_{\theta}(T(x))$ of the representations of the available images sharing the same label as $t(x)$. (4) This can be effectively implemented using a Siamese network.
  • Figure 3: Accuracy vs. prototype representation bias. We investigate the relationship between accuracy and prototype representation bias by adding or removing transformations from SimCLR's data augmentation strategy (base). Lower prototype representation bias tends to result in higher accuracy.
  • Figure 4: Impact of balancing parameters $\alpha$ and $\lambda$. Better balancing can be accomplished through the adjustments of the balancing parameters.
  • Figure 5: Impact of balancing parameters $\alpha$ and $\lambda$ on CIFAR-10. Better balancing can be accomplished through the adjustments of the balancing parameters.
  • ...and 1 more figures

Theorems & Definitions (14)

  • Theorem 4.1: upper bound of the attracting component
  • proof
  • Theorem 4.2: upper bound of the repelling component
  • proof
  • Theorem A.1: upper bound of the attracting component
  • proof
  • Lemma A.1
  • proof
  • Lemma A.2
  • proof
  • ...and 4 more