Contrastive Learning with Nasty Noise
Ziruo Zhao
TL;DR
The paper analyzes the theoretical limits of contrastive learning under nasty noise using the classical PAC framework and VC-dimension theory. It derives tight lower bounds on sample complexity for both arbitrary distance functions and $\ell_p$ distances, and shows that adversarial sample modification at rate $\eta$ imposes fundamental constraints on achievable accuracy, including a baseline bound $\epsilon<2\eta$. It also provides matching upper bounds in the classical PAC setting, with refined distance-specific bounds for even/odd $p$ and constant dimensionality, and extends to nasty-noise scenarios with $\Delta$-scaled bounds $n(\epsilon,\delta,\Delta)$ and data-dependent analyses based on $L_{con}$ and $\hat{L}_{con}$, including a binary-case simplification. Overall, the results quantify robustness limits and offer data-dependent tools to bound generalization under adversarial perturbations in contrastive representations.
Abstract
Contrastive learning has emerged as a powerful paradigm for self-supervised representation learning. This work analyzes the theoretical limits of contrastive learning under nasty noise, where an adversary modifies or replaces training samples. Using PAC learning and VC-dimension analysis, lower and upper bounds on sample complexity in adversarial settings are established. Additionally, data-dependent sample complexity bounds based on the l2-distance function are derived.
