Table of Contents
Fetching ...

ScoreCL: Augmentation-Adaptive Contrastive Learning via Score-Matching Function

Jin-Young Kim, Soonwoo Kwon, Hyojun Go, Yunsung Lee, Seungtaek Choi, Hyun-Gyoon Kim

TL;DR

ScoreCL addresses the limitation of traditional contrastive learning methods that treat augmented views uniformly by introducing a score-matching–based estimator of augmentation strength. It learns a score function $s_\theta(\cdot)$ via denoising score matching, then uses this function to adaptively weight positive pairs in a CL objective, yielding improvements across SimCLR, SimSiam, W-MSE, and VICReg on CIFAR-10/100 and ImageNet-100. The approach is modular, non-destructive to existing losses, and demonstrates that augmentation differences can be exploited to improve view-invariance and representation quality. Empirical results, ablations, and downstream task evaluations support the effectiveness and generality of the score-guided weighting framework, with potential for augmentation design and theoretical analysis enhancements.

Abstract

Self-supervised contrastive learning (CL) has achieved state-of-the-art performance in representation learning by minimizing the distance between positive pairs while maximizing that of negative ones. Recently, it has been verified that the model learns better representation with diversely augmented positive pairs because they enable the model to be more view-invariant. However, only a few studies on CL have considered the difference between augmented views, and have not gone beyond the hand-crafted findings. In this paper, we first observe that the score-matching function can measure how much data has changed from the original through augmentation. With the observed property, every pair in CL can be weighted adaptively by the difference of score values, resulting in boosting the performance of the existing CL method. We show the generality of our method, referred to as ScoreCL, by consistently improving various CL methods, SimCLR, SimSiam, W-MSE, and VICReg, up to 3%p in k-NN evaluation on CIFAR-10, CIFAR-100, and ImageNet-100. Moreover, we have conducted exhaustive experiments and ablations, including results on diverse downstream tasks, comparison with possible baselines, and improvement when used with other proposed augmentation methods. We hope our exploration will inspire more research in exploiting the score matching for CL.

ScoreCL: Augmentation-Adaptive Contrastive Learning via Score-Matching Function

TL;DR

ScoreCL addresses the limitation of traditional contrastive learning methods that treat augmented views uniformly by introducing a score-matching–based estimator of augmentation strength. It learns a score function via denoising score matching, then uses this function to adaptively weight positive pairs in a CL objective, yielding improvements across SimCLR, SimSiam, W-MSE, and VICReg on CIFAR-10/100 and ImageNet-100. The approach is modular, non-destructive to existing losses, and demonstrates that augmentation differences can be exploited to improve view-invariance and representation quality. Empirical results, ablations, and downstream task evaluations support the effectiveness and generality of the score-guided weighting framework, with potential for augmentation design and theoretical analysis enhancements.

Abstract

Self-supervised contrastive learning (CL) has achieved state-of-the-art performance in representation learning by minimizing the distance between positive pairs while maximizing that of negative ones. Recently, it has been verified that the model learns better representation with diversely augmented positive pairs because they enable the model to be more view-invariant. However, only a few studies on CL have considered the difference between augmented views, and have not gone beyond the hand-crafted findings. In this paper, we first observe that the score-matching function can measure how much data has changed from the original through augmentation. With the observed property, every pair in CL can be weighted adaptively by the difference of score values, resulting in boosting the performance of the existing CL method. We show the generality of our method, referred to as ScoreCL, by consistently improving various CL methods, SimCLR, SimSiam, W-MSE, and VICReg, up to 3%p in k-NN evaluation on CIFAR-10, CIFAR-100, and ImageNet-100. Moreover, we have conducted exhaustive experiments and ablations, including results on diverse downstream tasks, comparison with possible baselines, and improvement when used with other proposed augmentation methods. We hope our exploration will inspire more research in exploiting the score matching for CL.
Paper Structure (17 sections, 6 equations, 7 figures, 10 tables, 1 algorithm)

This paper contains 17 sections, 6 equations, 7 figures, 10 tables, 1 algorithm.

Figures (7)

  • Figure 1: The score values - magnitude of augmentation graph for each transform which are sampled from RandAugment cubuk2020randaugment.
  • Figure 2: The histogram of score values and sample images in the binning range. We confirm that, unlike the distribution of the score values of the original image, that of augmented images has a peak. Through the qualitative analysis, we confirm that the transformed images with high intensity of augmentation (especially, color-related transform) have low score values as shown in the left two columns.
  • Figure 3: The observation that the difference of score value is related to that of augmentation-scale. Note that the "Shear", and "Translate" transforms have negative directions, so we align the original images (i.e. zero magnitudes) in the middle across the axis. We can find that the difference in score values is smaller as the degree of transforms is closer. For example, when both views are transformed with "Color" (at the second row and fifth column), if the magnitude increases to the same degree, the difference between score values is low, and if the difference in magnitude is large, the difference in score is also increased.
  • Figure 4: Contour map of score values when two augmentations are applied to one image. Each axis corresponds to the augmentation scale. We can confirm the non-linear relation between score values and augmentation scale when more than two transforms are applied to one images.
  • Figure 5: The architecture of ScoreCL with score matching function $s_\theta$. The red diagram acts as the original CL model and the blue figure represents score guiding; it can represent the existing CL method with $d(S,S')=1$. Note that the $s_\theta$ is trained before CL so as to prevent the gradient flow through score matching.
  • ...and 2 more figures