Table of Contents
Fetching ...

Enhancing Contrastive Learning with Efficient Combinatorial Positive Pairing

Jaeill Kim, Duhun Hwang, Eunjung Lee, Jangwon Suh, Jimyeong Kim, Wonjong Rhee

TL;DR

This work tackles the efficiency and effectiveness of multi-view unsupervised representation learning by introducing Efficient Combinatorial Positive Pairing (ECPP). ECPP leverages $K$ views to create ${}_{K}\mathrm{C}_{2}$ positive pairs while employing small added views, crop-based augmentations, and a negative-sampling modification to keep computation practical. Applied to SimCLR and BYOL, ECPP delivers state-of-the-art linear evaluation on CIFAR-10 and ImageNet-100, and even surpasses supervised performance on ImageNet-100, while maintaining favorable learning speed under certain training regimes. The approach generalizes to non-contrastive methods and provides practical guidelines for augmentation, view sizing, and sampling to maximize gains. Overall, ECPP offers a broadly applicable, computation-efficient pathway to harness the benefits of multi-view contrastive and non-contrastive learning in vision.

Abstract

In the past few years, contrastive learning has played a central role for the success of visual unsupervised representation learning. Around the same time, high-performance non-contrastive learning methods have been developed as well. While most of the works utilize only two views, we carefully review the existing multi-view methods and propose a general multi-view strategy that can improve learning speed and performance of any contrastive or non-contrastive method. We first analyze CMC's full-graph paradigm and empirically show that the learning speed of $K$-views can be increased by $_{K}\mathrm{C}_{2}$ times for small learning rate and early training. Then, we upgrade CMC's full-graph by mixing views created by a crop-only augmentation, adopting small-size views as in SwAV multi-crop, and modifying the negative sampling. The resulting multi-view strategy is called ECPP (Efficient Combinatorial Positive Pairing). We investigate the effectiveness of ECPP by applying it to SimCLR and assessing the linear evaluation performance for CIFAR-10 and ImageNet-100. For each benchmark, we achieve a state-of-the-art performance. In case of ImageNet-100, ECPP boosted SimCLR outperforms supervised learning.

Enhancing Contrastive Learning with Efficient Combinatorial Positive Pairing

TL;DR

This work tackles the efficiency and effectiveness of multi-view unsupervised representation learning by introducing Efficient Combinatorial Positive Pairing (ECPP). ECPP leverages views to create positive pairs while employing small added views, crop-based augmentations, and a negative-sampling modification to keep computation practical. Applied to SimCLR and BYOL, ECPP delivers state-of-the-art linear evaluation on CIFAR-10 and ImageNet-100, and even surpasses supervised performance on ImageNet-100, while maintaining favorable learning speed under certain training regimes. The approach generalizes to non-contrastive methods and provides practical guidelines for augmentation, view sizing, and sampling to maximize gains. Overall, ECPP offers a broadly applicable, computation-efficient pathway to harness the benefits of multi-view contrastive and non-contrastive learning in vision.

Abstract

In the past few years, contrastive learning has played a central role for the success of visual unsupervised representation learning. Around the same time, high-performance non-contrastive learning methods have been developed as well. While most of the works utilize only two views, we carefully review the existing multi-view methods and propose a general multi-view strategy that can improve learning speed and performance of any contrastive or non-contrastive method. We first analyze CMC's full-graph paradigm and empirically show that the learning speed of -views can be increased by times for small learning rate and early training. Then, we upgrade CMC's full-graph by mixing views created by a crop-only augmentation, adopting small-size views as in SwAV multi-crop, and modifying the negative sampling. The resulting multi-view strategy is called ECPP (Efficient Combinatorial Positive Pairing). We investigate the effectiveness of ECPP by applying it to SimCLR and assessing the linear evaluation performance for CIFAR-10 and ImageNet-100. For each benchmark, we achieve a state-of-the-art performance. In case of ImageNet-100, ECPP boosted SimCLR outperforms supervised learning.
Paper Structure (22 sections, 4 equations, 5 figures, 8 tables)

This paper contains 22 sections, 4 equations, 5 figures, 8 tables.

Figures (5)

  • Figure 1: The number of positive pairs (equivalent to the number of loss terms $\mathcal{L}^{V_i,V_j}$) processed by multi-view representation learning frameworks.
  • Figure 2: Linear evaluation performance of ResNet-18 with 2, 4, and 8 views. (a) SimCLR with more views learns faster. In fact, SimCLR with $K$-views learns ${}_{K}\mathrm{C}_{2}$ times faster for any given iteration number. (b) By adjusting the $X$-axis to the number of processed positive pairs (i.e., the number of $\mathcal{L}_{\text{SimCLR}}^{V_i,V_j}$ terms used for back-propagation), it can be seen that the learning speed is about the same as long as the processed number of positive pairs is the same.
  • Figure 3: The effect of maximum epoch value. Because of the cosine learning rate decay loshchilov2016sgdr, we have evaluated the SimCLR$^{\times K}$ performance for a range of maximum epoch configurations. Each point in the plot corresponds to an independent evaluation. Results for ImageNet-100.
  • Figure 4: Linear evaluation performance of ResNet-18 with 2, 4, and 8 for BYOL$^{\times K}$. The figures are generated in the same way as in Figure \ref{['figure:efficiency_curve']}.
  • Figure 5: Linear evaluation performance of SimCLR$^{\times K}$ for ResNet-50 on CIFAR-10 with 2, 4, 6, and 8 views. The figure (a) and (b) are generated in the similar way as in Figure \ref{['figure:efficiency_curve']}.