Table of Contents
Fetching ...

Understanding Difficult-to-learn Examples in Contrastive Learning: A Theoretical Framework for Spectral Contrastive Learning

Yi-Ge Zhang, Jingyi Cui, Qiran Li, Yisen Wang

TL;DR

This work identifies difficult-to-learn samples as a key factor degrading unsupervised contrastive learning and develops a spectral similarity-graph framework to quantify their impact. It proves that these samples increase linear-probing error bounds and demonstrates three remedies—removing such samples, margin tuning, and temperature scaling—that modify the similarity graph to restore or improve generalization. The authors provide a practical, model-free mechanism to identify difficult samples and validate the theory across datasets (CIFAR, STL, TinyImagenet, ImageNet-1K) and contrastive paradigms (SimCLR, MoCo). The combined method yields the strongest gains, highlighting the potential to boost self-supervised representations by addressing sample difficulty. These insights have broad implications for robust self-supervised learning, especially under long-tail and distribution-shift scenarios.

Abstract

Unsupervised contrastive learning has shown significant performance improvements in recent years, often approaching or even rivaling supervised learning in various tasks. However, its learning mechanism is fundamentally different from that of supervised learning. Previous works have shown that difficult-to-learn examples (well-recognized in supervised learning as examples around the decision boundary), which are essential in supervised learning, contribute minimally in unsupervised settings. In this paper, perhaps surprisingly, we find that the direct removal of difficult-to-learn examples, although reduces the sample size, can boost the downstream classification performance of contrastive learning. To uncover the reasons behind this, we develop a theoretical framework modeling the similarity between different pairs of samples. Guided by this theoretical framework, we conduct a thorough theoretical analysis revealing that the presence of difficult-to-learn examples negatively affects the generalization of contrastive learning. Furthermore, we demonstrate that the removal of these examples, and techniques such as margin tuning and temperature scaling can enhance its generalization bounds, thereby improving performance. Empirically, we propose a simple and efficient mechanism for selecting difficult-to-learn examples and validate the effectiveness of the aforementioned methods, which substantiates the reliability of our proposed theoretical framework.

Understanding Difficult-to-learn Examples in Contrastive Learning: A Theoretical Framework for Spectral Contrastive Learning

TL;DR

This work identifies difficult-to-learn samples as a key factor degrading unsupervised contrastive learning and develops a spectral similarity-graph framework to quantify their impact. It proves that these samples increase linear-probing error bounds and demonstrates three remedies—removing such samples, margin tuning, and temperature scaling—that modify the similarity graph to restore or improve generalization. The authors provide a practical, model-free mechanism to identify difficult samples and validate the theory across datasets (CIFAR, STL, TinyImagenet, ImageNet-1K) and contrastive paradigms (SimCLR, MoCo). The combined method yields the strongest gains, highlighting the potential to boost self-supervised representations by addressing sample difficulty. These insights have broad implications for robust self-supervised learning, especially under long-tail and distribution-shift scenarios.

Abstract

Unsupervised contrastive learning has shown significant performance improvements in recent years, often approaching or even rivaling supervised learning in various tasks. However, its learning mechanism is fundamentally different from that of supervised learning. Previous works have shown that difficult-to-learn examples (well-recognized in supervised learning as examples around the decision boundary), which are essential in supervised learning, contribute minimally in unsupervised settings. In this paper, perhaps surprisingly, we find that the direct removal of difficult-to-learn examples, although reduces the sample size, can boost the downstream classification performance of contrastive learning. To uncover the reasons behind this, we develop a theoretical framework modeling the similarity between different pairs of samples. Guided by this theoretical framework, we conduct a thorough theoretical analysis revealing that the presence of difficult-to-learn examples negatively affects the generalization of contrastive learning. Furthermore, we demonstrate that the removal of these examples, and techniques such as margin tuning and temperature scaling can enhance its generalization bounds, thereby improving performance. Empirically, we propose a simple and efficient mechanism for selecting difficult-to-learn examples and validate the effectiveness of the aforementioned methods, which substantiates the reliability of our proposed theoretical framework.
Paper Structure (27 sections, 10 theorems, 71 equations, 8 figures, 9 tables, 1 algorithm)

This paper contains 27 sections, 10 theorems, 71 equations, 8 figures, 9 tables, 1 algorithm.

Key Result

Theorem 3.1

Denote $\mathcal{E}_{\mathrm{w.o.}}$ as the linear probing error of a contrastive learning model trained on a dataset without difficult-to-learn examples. Then

Figures (8)

  • Figure 1: Excluding difficult-to-learn examples improves contrastive learning.
  • Figure 2: Excluding (mixed) difficult-to-learn examples improves contrastive learning.
  • Figure 3: Modeling of difficult-to-learn examples. The similarity between same-class samples is $\alpha$ (a), the similarity between different-class difficult-to-learn samples is $\gamma$ (c), and the similarity between other samples is $\beta$ (b). The adjacency matrix of a 4-sample subset is shown in (d).
  • Figure 4: Parameter sensitivity of difficult-to-learn example interval ends $posHigh$ (\ref{['fig:poshigh']}) and $posLow$ (\ref{['fig:poslow']}). Parameter analysis on CIFAR-100: the trend of the ratio of sample pairs from different classes in ($Sim_{posLow}$, $Sim_{posHigh}$) during the training process (\ref{['fig:trend']}).
  • Figure 5: The results of incorporating the Combined method with different architectures on CIFAR-10.
  • ...and 3 more figures

Theorems & Definitions (20)

  • Theorem 3.1: Error Bound without difficult-to-learn Examples
  • Theorem 3.2: Error Bound with difficult-to-learn Examples
  • Corollary 4.1
  • Theorem 4.2
  • Theorem 4.3
  • Theorem 4.4
  • Theorem 4.5
  • proof : Proof of Theorem \ref{['thm::boundwo']}
  • proof : Proof of Theorem \ref{['thm::boundwh']}
  • proof : Proof of Corollary \ref{['thm::boundremove']}
  • ...and 10 more