Contrastive Learning with Synthetic Positives
Dewen Zeng, Yawen Wu, Xinrong Hu, Xiaowei Xu, Yiyu Shi
TL;DR
This work addresses the limited diversity of positives in contrastive self-supervised learning by replacing only easy nearest-neighbor positives with synthetic positives generated by an unconditional diffusion model. It introduces CLSP, which uses feature interpolation $h = w \cdot h + (1-w) \cdot h_{anchor}$ to produce hard positives $x_i^3$ and augments the standard loss with $L = \sum_{i,k} L_{i,k} + \lambda \sum_i (z_i^2 - z_i^3)^2$, leveraging a pre-generated candidate set of size $k \le 8$. Empirically, CLSP variants outperform strong baselines on CIFAR10/100, STL10, and ImageNet100 in linear and transfer evaluations, with notable gains such as around 2.9 percentage points on CIFAR10 and, for CIFAR100, up to about 6.2 points, and 6/8 downstream improvements in transfer tasks. The results position diffusion-guided synthetic positives as a robust baseline for diffusion-assisted SSL and encourage scaling to larger datasets and broader tasks.
Abstract
Contrastive learning with the nearest neighbor has proved to be one of the most efficient self-supervised learning (SSL) techniques by utilizing the similarity of multiple instances within the same class. However, its efficacy is constrained as the nearest neighbor algorithm primarily identifies "easy" positive pairs, where the representations are already closely located in the embedding space. In this paper, we introduce a novel approach called Contrastive Learning with Synthetic Positives (CLSP) that utilizes synthetic images, generated by an unconditional diffusion model, as the additional positives to help the model learn from diverse positives. Through feature interpolation in the diffusion model sampling process, we generate images with distinct backgrounds yet similar semantic content to the anchor image. These images are considered "hard" positives for the anchor image, and when included as supplementary positives in the contrastive loss, they contribute to a performance improvement of over 2% and 1% in linear evaluation compared to the previous NNCLR and All4One methods across multiple benchmark datasets such as CIFAR10, achieving state-of-the-art methods. On transfer learning benchmarks, CLSP outperforms existing SSL frameworks on 6 out of 8 downstream datasets. We believe CLSP establishes a valuable baseline for future SSL studies incorporating synthetic data in the training process.
