Regularizing with Pseudo-Negatives for Continual Self-Supervised Learning

Sungmin Cha; Kyunghyun Cho; Taesup Moon

Regularizing with Pseudo-Negatives for Continual Self-Supervised Learning

Sungmin Cha, Kyunghyun Cho, Taesup Moon

TL;DR

This work tackles forgetting in Continual Self-Supervised Learning (CSSL) by proposing Pseudo-Negative Regularization (PNR), which introduces pseudo-negatives derived from both current and past models to regulate SSL losses. For InfoNCE-based methods, PNR defines two symmetric losses that incorporate pseudo-negatives, enabling improved plasticity and stability; for non-contrastive methods, PNR regularizes with pseudo-negatives constructed from different augmentations of the past model’s outputs. Across extensive experiments on CIFAR-100, ImageNet-100, DomainNet, and ImageNet-1k, PNR consistently improves representation quality and down-stream linear probe performance, achieving state-of-the-art results in several CSSL scenarios while maintaining stability and plasticity. The approach also demonstrates robustness across multiple SSL backbones and tasks, with additional analysis of ablations and queue-size effects. Limitations include current focus on CNNs and vision domains, suggesting future work to extend to transformers and NLP settings.

Abstract

We introduce a novel Pseudo-Negative Regularization (PNR) framework for effective continual self-supervised learning (CSSL). Our PNR leverages pseudo-negatives obtained through model-based augmentation in a way that newly learned representations may not contradict what has been learned in the past. Specifically, for the InfoNCE-based contrastive learning methods, we define symmetric pseudo-negatives obtained from current and previous models and use them in both main and regularization loss terms. Furthermore, we extend this idea to non-contrastive learning methods which do not inherently rely on negatives. For these methods, a pseudo-negative is defined as the output from the previous model for a differently augmented version of the anchor sample and is asymmetrically applied to the regularization term. Extensive experimental results demonstrate that our PNR framework achieves state-of-the-art performance in representation learning during CSSL by effectively balancing the trade-off between plasticity and stability.

Regularizing with Pseudo-Negatives for Continual Self-Supervised Learning

TL;DR

Abstract

Paper Structure (28 sections, 10 equations, 5 figures, 17 tables)

This paper contains 28 sections, 10 equations, 5 figures, 17 tables.

Introduction
Related Work
Problem Setting
Pseudo-Negative Regularization (PNR)
Motivation
InfoNCE-based Contrastive Learning Case
Non-Contrastive Learning Case
Experiments
Experimental Details
Experiments with SSL Methods
Experiments with Diverse CSSL Scenarios
Experimental Analysis
Limitation and Future Work
Concluding Remarks
Supplementary Materials for Section 3
...and 13 more sections

Figures (5)

Figure 1: The overview of using pseudo-negatives in CSSL with contrastive learning. Note that red dashed arrows denote the incorporation of the proposed pseudo-negatives, which are output features from distinct models, in each loss function.
Figure 2: Graphical representation of learning with our proposed loss. The blue dashed arrow indicates the direction of the gradient update during training with the proposed loss. It moves away from the negative and pseudo-negative embeddings, which correspond to current and past models, while converging towards the positive embeddings of the current and past models.
Figure 3: Experimental results of applying PNR to SSL methods. Note that "+CaSSLe" and "+PNR" indicate the results of applying CaSSLe and PNR to each SSL method, respectively.
Figure 4: The graph illustrates the values of a$_{k,t}$ of each algorithm in the Class-IL (5T) scenario using the ImageNet-100 dataset. The measured stability ($S \downarrow$) and plasticity ($P \uparrow$) of each method are as follows: (a) $(S, P)=(1.23, 3.47)$, (b) $(S, P)=(2.80, 2.52)$, (c) $(S, P)=(3.13, 2.38)$, (d) $(S, P)=(0.4, -0.07)$, (e) $(S, P)=(1.5, -0.47)$, (f) $(S, P)=(4.9, 1.6)$.
Figure 5: The graph illustrates the values of a$_{k,t}$ for each algorithm in the Class-IL (5T) scenario. The measured stability and plasticity for each method are as follows: (a) $(S, P)=(2.52, 2.8)$, (b) $(S, P)=(2.22, 1.95)$.

Regularizing with Pseudo-Negatives for Continual Self-Supervised Learning

TL;DR

Abstract

Regularizing with Pseudo-Negatives for Continual Self-Supervised Learning

Authors

TL;DR

Abstract

Table of Contents

Figures (5)