Table of Contents
Fetching ...

Contrastive Learning with Nasty Noise

Ziruo Zhao

TL;DR

The paper analyzes the theoretical limits of contrastive learning under nasty noise using the classical PAC framework and VC-dimension theory. It derives tight lower bounds on sample complexity for both arbitrary distance functions and $\ell_p$ distances, and shows that adversarial sample modification at rate $\eta$ imposes fundamental constraints on achievable accuracy, including a baseline bound $\epsilon<2\eta$. It also provides matching upper bounds in the classical PAC setting, with refined distance-specific bounds for even/odd $p$ and constant dimensionality, and extends to nasty-noise scenarios with $\Delta$-scaled bounds $n(\epsilon,\delta,\Delta)$ and data-dependent analyses based on $L_{con}$ and $\hat{L}_{con}$, including a binary-case simplification. Overall, the results quantify robustness limits and offer data-dependent tools to bound generalization under adversarial perturbations in contrastive representations.

Abstract

Contrastive learning has emerged as a powerful paradigm for self-supervised representation learning. This work analyzes the theoretical limits of contrastive learning under nasty noise, where an adversary modifies or replaces training samples. Using PAC learning and VC-dimension analysis, lower and upper bounds on sample complexity in adversarial settings are established. Additionally, data-dependent sample complexity bounds based on the l2-distance function are derived.

Contrastive Learning with Nasty Noise

TL;DR

The paper analyzes the theoretical limits of contrastive learning under nasty noise using the classical PAC framework and VC-dimension theory. It derives tight lower bounds on sample complexity for both arbitrary distance functions and distances, and shows that adversarial sample modification at rate imposes fundamental constraints on achievable accuracy, including a baseline bound . It also provides matching upper bounds in the classical PAC setting, with refined distance-specific bounds for even/odd and constant dimensionality, and extends to nasty-noise scenarios with -scaled bounds and data-dependent analyses based on and , including a binary-case simplification. Overall, the results quantify robustness limits and offer data-dependent tools to bound generalization under adversarial perturbations in contrastive representations.

Abstract

Contrastive learning has emerged as a powerful paradigm for self-supervised representation learning. This work analyzes the theoretical limits of contrastive learning under nasty noise, where an adversary modifies or replaces training samples. Using PAC learning and VC-dimension analysis, lower and upper bounds on sample complexity in adversarial settings are established. Additionally, data-dependent sample complexity bounds based on the l2-distance function are derived.

Paper Structure

This paper contains 22 sections, 22 theorems, 39 equations.

Key Result

Lemma 1.5

For any two classes $\mathcal{H}$ and $\mathcal{F}$ over $\mathcal{X}$,

Theorems & Definitions (44)

  • Definition 1.1: Contrastive learning, classical PAC case
  • Definition 1.2: Contrastive learning, nasty noise case
  • Definition 1.3: Shattering
  • Definition 1.4: VC-dimension
  • Lemma 1.5
  • Definition 1.6
  • Claim 1.7
  • Definition 1.8: $\alpha$-sample
  • Theorem 1.9
  • Definition 1.10: Natarajan dimension
  • ...and 34 more