Table of Contents
Fetching ...

Implicit Contrastive Representation Learning with Guided Stop-gradient

Byeongchan Lee, Sehyun Lee

TL;DR

The paper tackles collapse in self-supervised Siamese learning by introducing implicit contrastive learning through asymmetric source/target encoders and a guided stop-gradient (GSG) mechanism. By selecting stop-gradient terms based on geometric relations between projected views, the method induces a contrastive effect without explicit negative terms, and can be applied to SimSiam and BYOL. Empirically, GSG improves representation quality and transfer performance on ImageNet and CIFAR-10, and remains robust with few negative samples and even without a predictor, while maintaining training stability. This work demonstrates that carefully guided asymmetry can fuse the benefits of contrastive and asymmetric SSL approaches, offering practical gains for downstream tasks.

Abstract

In self-supervised representation learning, Siamese networks are a natural architecture for learning transformation-invariance by bringing representations of positive pairs closer together. But it is prone to collapse into a degenerate solution. To address the issue, in contrastive learning, a contrastive loss is used to prevent collapse by moving representations of negative pairs away from each other. But it is known that algorithms with negative sampling are not robust to a reduction in the number of negative samples. So, on the other hand, there are algorithms that do not use negative pairs. Many positive-only algorithms adopt asymmetric network architecture consisting of source and target encoders as a key factor in coping with collapse. By exploiting the asymmetric architecture, we introduce a methodology to implicitly incorporate the idea of contrastive learning. As its implementation, we present a novel method guided stop-gradient. We apply our method to benchmark algorithms SimSiam and BYOL and show that our method stabilizes training and boosts performance. We also show that the algorithms with our method work well with small batch sizes and do not collapse even when there is no predictor. The code is available at https://github.com/bych-lee/gsg.

Implicit Contrastive Representation Learning with Guided Stop-gradient

TL;DR

The paper tackles collapse in self-supervised Siamese learning by introducing implicit contrastive learning through asymmetric source/target encoders and a guided stop-gradient (GSG) mechanism. By selecting stop-gradient terms based on geometric relations between projected views, the method induces a contrastive effect without explicit negative terms, and can be applied to SimSiam and BYOL. Empirically, GSG improves representation quality and transfer performance on ImageNet and CIFAR-10, and remains robust with few negative samples and even without a predictor, while maintaining training stability. This work demonstrates that carefully guided asymmetry can fuse the benefits of contrastive and asymmetric SSL approaches, offering practical gains for downstream tasks.

Abstract

In self-supervised representation learning, Siamese networks are a natural architecture for learning transformation-invariance by bringing representations of positive pairs closer together. But it is prone to collapse into a degenerate solution. To address the issue, in contrastive learning, a contrastive loss is used to prevent collapse by moving representations of negative pairs away from each other. But it is known that algorithms with negative sampling are not robust to a reduction in the number of negative samples. So, on the other hand, there are algorithms that do not use negative pairs. Many positive-only algorithms adopt asymmetric network architecture consisting of source and target encoders as a key factor in coping with collapse. By exploiting the asymmetric architecture, we introduce a methodology to implicitly incorporate the idea of contrastive learning. As its implementation, we present a novel method guided stop-gradient. We apply our method to benchmark algorithms SimSiam and BYOL and show that our method stabilizes training and boosts performance. We also show that the algorithms with our method work well with small batch sizes and do not collapse even when there is no predictor. The code is available at https://github.com/bych-lee/gsg.

Paper Structure

This paper contains 26 sections, 7 equations, 7 figures, 4 tables, 1 algorithm.

Figures (7)

  • Figure 1: (a) Dots of the same color are representations of a positive pair. Without contrastive loss, it aims for a repelling effect by carefully determining which to make the source representation and which to make the target representation. (b) In SimSiam and BYOL, a given image $x$ is randomly transformed into two views $x_1$ and $x_2$. The views are processed by encoders $f_1$ and $f_2$ to have projections $z_1$ and $z_2$. A predictor is applied on one side, and stop-gradient is applied on the other. Then, the similarity between the outputs from both sides is maximized. By using the predictor and stop-gradient alternately, a symmetric loss is constructed.
  • Figure 2: An example for two images. The dots represent four projections of the two images. The arrows represent the expected effect of the loss terms. We want dots of the same color to come close to each other. We select loss terms so that two closest dots with different colors will fall apart.
  • Figure 3: Overview of our guided stop-gradient method. (1) The encoders process two images $x_1$, $x_2$ that are reference to each other. (2) Investigate the distances $d_{11,21}$, $d_{11,22}$, $d_{12,21}$, and $d_{12,22}$ between the projections of negative pairs. (3) Determine which side to apply stop-gradient and which to apply a predictor.
  • Figure 4: Importance of guiding. Depending on how stop-gradient is used, performance is significantly different. It shows the best performance when used along with GSG.
  • Figure 5: Preventing collapse. Unlike existing algorithms, algorithms to which GSG is applied do not collapse even when the predictor is removed.
  • ...and 2 more figures