Table of Contents
Fetching ...

Balanced Gradient Sample Retrieval for Enhanced Knowledge Retention in Proxy-based Continual Learning

Hongye Xu, Jan Wasilewski, Bartosz Krawczyk

TL;DR

This work tackles catastrophic forgetting in continual learning by introducing a balanced sample retrieval strategy for memory buffers in a supervised contrastive framework. By combining gradient-aligned and gradient-conflicting samples, the method preserves past knowledge while stabilizing shared representations, mitigating proxy drift. The approach is supported by theoretical analysis of gradient interactions and empirical results showing state-of-the-art performance across six vision datasets, with robust improvements in retention and adaptation. The work demonstrates that balanced retrieval enhances buffer diversity and stability, offering practical gains for proxy-based continual learning in realistic data streams.

Abstract

Continual learning in deep neural networks often suffers from catastrophic forgetting, where representations for previous tasks are overwritten during subsequent training. We propose a novel sample retrieval strategy from the memory buffer that leverages both gradient-conflicting and gradient-aligned samples to effectively retain knowledge about past tasks within a supervised contrastive learning framework. Gradient-conflicting samples are selected for their potential to reduce interference by re-aligning gradients, thereby preserving past task knowledge. Meanwhile, gradient-aligned samples are incorporated to reinforce stable, shared representations across tasks. By balancing gradient correction from conflicting samples with alignment reinforcement from aligned ones, our approach increases the diversity among retrieved instances and achieves superior alignment in parameter space, significantly enhancing knowledge retention and mitigating proxy drift. Empirical results demonstrate that using both sample types outperforms methods relying solely on one sample type or random retrieval. Experiments on popular continual learning benchmarks in computer vision validate our method's state-of-the-art performance in mitigating forgetting while maintaining competitive accuracy on new tasks.

Balanced Gradient Sample Retrieval for Enhanced Knowledge Retention in Proxy-based Continual Learning

TL;DR

This work tackles catastrophic forgetting in continual learning by introducing a balanced sample retrieval strategy for memory buffers in a supervised contrastive framework. By combining gradient-aligned and gradient-conflicting samples, the method preserves past knowledge while stabilizing shared representations, mitigating proxy drift. The approach is supported by theoretical analysis of gradient interactions and empirical results showing state-of-the-art performance across six vision datasets, with robust improvements in retention and adaptation. The work demonstrates that balanced retrieval enhances buffer diversity and stability, offering practical gains for proxy-based continual learning in realistic data streams.

Abstract

Continual learning in deep neural networks often suffers from catastrophic forgetting, where representations for previous tasks are overwritten during subsequent training. We propose a novel sample retrieval strategy from the memory buffer that leverages both gradient-conflicting and gradient-aligned samples to effectively retain knowledge about past tasks within a supervised contrastive learning framework. Gradient-conflicting samples are selected for their potential to reduce interference by re-aligning gradients, thereby preserving past task knowledge. Meanwhile, gradient-aligned samples are incorporated to reinforce stable, shared representations across tasks. By balancing gradient correction from conflicting samples with alignment reinforcement from aligned ones, our approach increases the diversity among retrieved instances and achieves superior alignment in parameter space, significantly enhancing knowledge retention and mitigating proxy drift. Empirical results demonstrate that using both sample types outperforms methods relying solely on one sample type or random retrieval. Experiments on popular continual learning benchmarks in computer vision validate our method's state-of-the-art performance in mitigating forgetting while maintaining competitive accuracy on new tasks.

Paper Structure

This paper contains 23 sections, 13 equations, 20 figures, 8 tables, 1 algorithm.

Figures (20)

  • Figure 1: The average class accuracy after +k task batches since the moment a task appeared.
  • Figure 2: The average accuracy for selected classes for different models after subsequent task batches.
  • Figure 3: The average retention rate after +k task batches since the moment a task appeared for: CLSER, CLSER+OURS, ER-ACE, ER-ACE+OURS, PCR, PCR+OURS.
  • Figure 4: The average proxy drift for selected tasks from CIFAR100.
  • Figure 5: The average inner distance of retrieved samples during the training of selected tasks from CIFAR100.
  • ...and 15 more figures