An Augmentation Overlap Theory of Contrastive Learning
Qi Zhang, Yifei Wang, Yisen Wang
TL;DR
This work introduces augmentation overlap theory to explain why contrastive learning yields useful representations, arguing that aggressive data augmentations create overlap among intra-class views so that simply aligning positives helps cluster intra-class samples and, incidentally, inter-class separation can be preserved. It first tightens generalization bounds under the conventional conditional independence assumption and then relaxes to a more practical augmentation-overlap assumption using an augmentation-graph framework with epsilon-alignment and intra-class connectivity. The authors derive both CI-based and augmentation-based bounds, analyze augmentation strength via random-graph and spectral graph theory, and establish an unsupervised representation-evaluation metric ARC (and its generalized forms) that correlates strongly with downstream performance. Empirically, they show how augmentation strength governs augmentation graph connectivity and validate ARC on CIFAR-10 and ImageNet, offering a practical, label-free tool for model selection and insights for designing augmentations.
Abstract
Recently, self-supervised contrastive learning has achieved great success on various tasks. However, its underlying working mechanism is yet unclear. In this paper, we first provide the tightest bounds based on the widely adopted assumption of conditional independence. Further, we relax the conditional independence assumption to a more practical assumption of augmentation overlap and derive the asymptotically closed bounds for the downstream performance. Our proposed augmentation overlap theory hinges on the insight that the support of different intra-class samples will become more overlapped under aggressive data augmentations, thus simply aligning the positive samples (augmented views of the same sample) could make contrastive learning cluster intra-class samples together. Moreover, from the newly derived augmentation overlap perspective, we develop an unsupervised metric for the representation evaluation of contrastive learning, which aligns well with the downstream performance almost without relying on additional modules. Code is available at https://github.com/PKU-ML/GARC.
