Table of Contents
Fetching ...

An Augmentation Overlap Theory of Contrastive Learning

Qi Zhang, Yifei Wang, Yisen Wang

TL;DR

This work introduces augmentation overlap theory to explain why contrastive learning yields useful representations, arguing that aggressive data augmentations create overlap among intra-class views so that simply aligning positives helps cluster intra-class samples and, incidentally, inter-class separation can be preserved. It first tightens generalization bounds under the conventional conditional independence assumption and then relaxes to a more practical augmentation-overlap assumption using an augmentation-graph framework with epsilon-alignment and intra-class connectivity. The authors derive both CI-based and augmentation-based bounds, analyze augmentation strength via random-graph and spectral graph theory, and establish an unsupervised representation-evaluation metric ARC (and its generalized forms) that correlates strongly with downstream performance. Empirically, they show how augmentation strength governs augmentation graph connectivity and validate ARC on CIFAR-10 and ImageNet, offering a practical, label-free tool for model selection and insights for designing augmentations.

Abstract

Recently, self-supervised contrastive learning has achieved great success on various tasks. However, its underlying working mechanism is yet unclear. In this paper, we first provide the tightest bounds based on the widely adopted assumption of conditional independence. Further, we relax the conditional independence assumption to a more practical assumption of augmentation overlap and derive the asymptotically closed bounds for the downstream performance. Our proposed augmentation overlap theory hinges on the insight that the support of different intra-class samples will become more overlapped under aggressive data augmentations, thus simply aligning the positive samples (augmented views of the same sample) could make contrastive learning cluster intra-class samples together. Moreover, from the newly derived augmentation overlap perspective, we develop an unsupervised metric for the representation evaluation of contrastive learning, which aligns well with the downstream performance almost without relying on additional modules. Code is available at https://github.com/PKU-ML/GARC.

An Augmentation Overlap Theory of Contrastive Learning

TL;DR

This work introduces augmentation overlap theory to explain why contrastive learning yields useful representations, arguing that aggressive data augmentations create overlap among intra-class views so that simply aligning positives helps cluster intra-class samples and, incidentally, inter-class separation can be preserved. It first tightens generalization bounds under the conventional conditional independence assumption and then relaxes to a more practical augmentation-overlap assumption using an augmentation-graph framework with epsilon-alignment and intra-class connectivity. The authors derive both CI-based and augmentation-based bounds, analyze augmentation strength via random-graph and spectral graph theory, and establish an unsupervised representation-evaluation metric ARC (and its generalized forms) that correlates strongly with downstream performance. Empirically, they show how augmentation strength governs augmentation graph connectivity and validate ARC on CIFAR-10 and ImageNet, offering a practical, label-free tool for model selection and insights for designing augmentations.

Abstract

Recently, self-supervised contrastive learning has achieved great success on various tasks. However, its underlying working mechanism is yet unclear. In this paper, we first provide the tightest bounds based on the widely adopted assumption of conditional independence. Further, we relax the conditional independence assumption to a more practical assumption of augmentation overlap and derive the asymptotically closed bounds for the downstream performance. Our proposed augmentation overlap theory hinges on the insight that the support of different intra-class samples will become more overlapped under aggressive data augmentations, thus simply aligning the positive samples (augmented views of the same sample) could make contrastive learning cluster intra-class samples together. Moreover, from the newly derived augmentation overlap perspective, we develop an unsupervised metric for the representation evaluation of contrastive learning, which aligns well with the downstream performance almost without relying on additional modules. Code is available at https://github.com/PKU-ML/GARC.

Paper Structure

This paper contains 35 sections, 12 theorems, 38 equations, 14 figures, 2 tables.

Key Result

Lemma 2

The approximation error of the Monte Carlo estimation $\bar{{\mathcal{L}}}_{\rm MC}(f)$ shrinks in the order ${\mathcal{O}}(M^{-1/2})$, specifically.

Figures (14)

  • Figure 1: The t-SNE visualization of representations before and after contrastive learning method of SimCLR on CIFAR-10 data set. Each point denotes a sample and its color denotes its class.
  • Figure 2: (a) The framework of contrastive learning. (b) The augmented views of four images from ImageNet. The first two rows are cars while the bottom two rows are pens. The ancher samples are shown in the 1st column while the 2-5th columns present its corresponding augmented views.
  • Figure 3: Comparison of upper bounds on the downstream loss (measured by mean CE loss) on CIFAR-10. The encoder is a ResNet-18 he2016deep and we train it using SimCLR simclr. We calculate the upper bounds using its representations at the (a) initialization and (b) final stages.
  • Figure 4: Illustrative examples of augmentation graphs, where each dot denotes a sample $x\in{\mathcal{D}}_u$ and its color denotes its class. The lighter disks denote the support of the positive samples $p(x^+|x)$. We draw a solid edge for each pair that has an edge.
  • Figure 5: t-SNE visualization of features learned with different augmentation strength $r$ on the random augmentation graph experiment. Each dot denotes a sample and its color denotes its class.
  • ...and 9 more figures

Theorems & Definitions (13)

  • Lemma 2
  • Theorem 3: Downstream Guarantees under Conditional Independence
  • Theorem 5: Downstream Guarantees without Conditional Independence
  • Proposition 6
  • Definition 7: Augmentation Graph
  • Theorem 10: Guarantees with Connected Augmentation Graph
  • Corollary 11
  • Corollary 12
  • Corollary 12
  • Lemma 13: The main results in haochen2021provable
  • ...and 3 more