Table of Contents
Fetching ...

Continual Learning: Less Forgetting, More OOD Generalization via Adaptive Contrastive Replay

Hossein Rezaei, Mohammad Sabokrou

TL;DR

Adaptive Contrastive Replay (ACR), a method that employs dual optimization to simultaneously train both the encoder and the classifier to address Out-of-Distribution generalization, is introduced.

Abstract

Machine learning models often suffer from catastrophic forgetting of previously learned knowledge when learning new classes. Various methods have been proposed to mitigate this issue. However, rehearsal-based learning, which retains samples from previous classes, typically achieves good performance but tends to memorize specific instances, struggling with Out-of-Distribution (OOD) generalization. This often leads to high forgetting rates and poor generalization. Surprisingly, the OOD generalization capabilities of these methods have been largely unexplored. In this paper, we highlight this issue and propose a simple yet effective strategy inspired by contrastive learning and data-centric principles to address it. We introduce Adaptive Contrastive Replay (ACR), a method that employs dual optimization to simultaneously train both the encoder and the classifier. ACR adaptively populates the replay buffer with misclassified samples while ensuring a balanced representation of classes and tasks. By refining the decision boundary in this way, ACR achieves a balance between stability and plasticity. Our method significantly outperforms previous approaches in terms of OOD generalization, achieving an improvement of 13.41\% on Split CIFAR-100, 9.91\% on Split Mini-ImageNet, and 5.98\% on Split Tiny-ImageNet.

Continual Learning: Less Forgetting, More OOD Generalization via Adaptive Contrastive Replay

TL;DR

Adaptive Contrastive Replay (ACR), a method that employs dual optimization to simultaneously train both the encoder and the classifier to address Out-of-Distribution generalization, is introduced.

Abstract

Machine learning models often suffer from catastrophic forgetting of previously learned knowledge when learning new classes. Various methods have been proposed to mitigate this issue. However, rehearsal-based learning, which retains samples from previous classes, typically achieves good performance but tends to memorize specific instances, struggling with Out-of-Distribution (OOD) generalization. This often leads to high forgetting rates and poor generalization. Surprisingly, the OOD generalization capabilities of these methods have been largely unexplored. In this paper, we highlight this issue and propose a simple yet effective strategy inspired by contrastive learning and data-centric principles to address it. We introduce Adaptive Contrastive Replay (ACR), a method that employs dual optimization to simultaneously train both the encoder and the classifier. ACR adaptively populates the replay buffer with misclassified samples while ensuring a balanced representation of classes and tasks. By refining the decision boundary in this way, ACR achieves a balance between stability and plasticity. Our method significantly outperforms previous approaches in terms of OOD generalization, achieving an improvement of 13.41\% on Split CIFAR-100, 9.91\% on Split Mini-ImageNet, and 5.98\% on Split Tiny-ImageNet.

Paper Structure

This paper contains 27 sections, 9 equations, 5 figures, 5 tables.

Figures (5)

  • Figure 1: Evaluating Out-of-Distribution (OOD) Generalization Capability: The performance of state-of-the-art rehearsal-based methods on the Split CIFAR-100, Split Mini-ImageNet, and Split Tiny-ImageNet datasets significantly drops on OOD samples, highlighting their lack of generalization. In this paper, we address this challenge by proposing a method that consistently outperforms existing approaches across all datasets.
  • Figure 2: Illustration of the buffer update policy in our method (ACR). After training each task, the buffer is updated with the most challenging samples, identified by high confidence variation, while maintaining class and task balance.
  • Figure 3: Analysis of various methods with a buffer size of 1000 on the Split CIFAR-100 dataset (results are averaged over 5 runs). This figure presents ACC and BWT for each method under both the i.i.d. and OOD scenarios at every training stage. Specifically, it shows the average performance of seen tasks at each stage $t$ and the performance of each task at the end of training.
  • Figure 4: Hyperparameter sensitivity analysis of $E$ in our boundary sample identification and buffer population approach. The evaluation is performed across all three datasets with a buffer size of 500, using both ACC and BWT metrics, under i.i.d. and OOD scenarios. Results are averaged over 5 runs. Each color represents a dataset: solid lines indicate our updating policy, while dashed lines correspond to the reservoir update policy. Both policies utilize our proxy-based contrastive learning method, with only the update policy varying.
  • Figure 5: GPU usage and CPU utilization for each method up to 35k time steps, with detailed values at 5k steps. The experiment was conducted with Split CIFAR-100, a buffer size of 2000.