Enhancing Contrastive Learning with Efficient Combinatorial Positive Pairing
Jaeill Kim, Duhun Hwang, Eunjung Lee, Jangwon Suh, Jimyeong Kim, Wonjong Rhee
TL;DR
This work tackles the efficiency and effectiveness of multi-view unsupervised representation learning by introducing Efficient Combinatorial Positive Pairing (ECPP). ECPP leverages $K$ views to create ${}_{K}\mathrm{C}_{2}$ positive pairs while employing small added views, crop-based augmentations, and a negative-sampling modification to keep computation practical. Applied to SimCLR and BYOL, ECPP delivers state-of-the-art linear evaluation on CIFAR-10 and ImageNet-100, and even surpasses supervised performance on ImageNet-100, while maintaining favorable learning speed under certain training regimes. The approach generalizes to non-contrastive methods and provides practical guidelines for augmentation, view sizing, and sampling to maximize gains. Overall, ECPP offers a broadly applicable, computation-efficient pathway to harness the benefits of multi-view contrastive and non-contrastive learning in vision.
Abstract
In the past few years, contrastive learning has played a central role for the success of visual unsupervised representation learning. Around the same time, high-performance non-contrastive learning methods have been developed as well. While most of the works utilize only two views, we carefully review the existing multi-view methods and propose a general multi-view strategy that can improve learning speed and performance of any contrastive or non-contrastive method. We first analyze CMC's full-graph paradigm and empirically show that the learning speed of $K$-views can be increased by $_{K}\mathrm{C}_{2}$ times for small learning rate and early training. Then, we upgrade CMC's full-graph by mixing views created by a crop-only augmentation, adopting small-size views as in SwAV multi-crop, and modifying the negative sampling. The resulting multi-view strategy is called ECPP (Efficient Combinatorial Positive Pairing). We investigate the effectiveness of ECPP by applying it to SimCLR and assessing the linear evaluation performance for CIFAR-10 and ImageNet-100. For each benchmark, we achieve a state-of-the-art performance. In case of ImageNet-100, ECPP boosted SimCLR outperforms supervised learning.
