Table of Contents
Fetching ...

Slowing Down Forgetting in Continual Learning

Pascal Janetzky, Tobias Schlagenhauf, Stefan Feuerriegel

TL;DR

The paper addresses catastrophic forgetting in continual learning by introducing ReCL, a memory-free framework that leverages the implicit bias of gradient-based optimization toward margin-maximization points to reconstruct past task data from the current model. It then merges these reconstructed samples with the current task data to slow forgetting, and is designed to be compatible with existing CL methods. Across class- and domain-incremental scenarios on MNIST, CIFAR-10, and TinyImageNet, ReCL yields consistent improvements in ACC while reducing forgetting (BWT), including gains on non-homogeneous architectures. The approach offers a practical, flexible memory-buffer alternative that scales to large datasets and real-world streaming settings, with strong empirical support and theoretical grounding in dataset reconstruction from trained models.

Abstract

A common challenge in continual learning (CL) is catastrophic forgetting, where the performance on old tasks drops after new, additional tasks are learned. In this paper, we propose a novel framework called ReCL to slow down forgetting in CL. Our framework exploits an implicit bias of gradient-based neural networks due to which these converge to margin maximization points. Such convergence points allow us to reconstruct old data from previous tasks, which we then combine with the current training data. Our framework is flexible and can be applied on top of existing, state-of-the-art CL methods. We further demonstrate the performance gain from our framework across a large series of experiments, including two challenging CL scenarios (class incremental and domain incremental learning), different datasets (MNIST, CIFAR10, TinyImagenet), and different network architectures. Across all experiments, we find large performance gains through ReCL. To the best of our knowledge, our framework is the first to address catastrophic forgetting by leveraging models in CL as their own memory buffers.

Slowing Down Forgetting in Continual Learning

TL;DR

The paper addresses catastrophic forgetting in continual learning by introducing ReCL, a memory-free framework that leverages the implicit bias of gradient-based optimization toward margin-maximization points to reconstruct past task data from the current model. It then merges these reconstructed samples with the current task data to slow forgetting, and is designed to be compatible with existing CL methods. Across class- and domain-incremental scenarios on MNIST, CIFAR-10, and TinyImageNet, ReCL yields consistent improvements in ACC while reducing forgetting (BWT), including gains on non-homogeneous architectures. The approach offers a practical, flexible memory-buffer alternative that scales to large datasets and real-world streaming settings, with strong empirical support and theoretical grounding in dataset reconstruction from trained models.

Abstract

A common challenge in continual learning (CL) is catastrophic forgetting, where the performance on old tasks drops after new, additional tasks are learned. In this paper, we propose a novel framework called ReCL to slow down forgetting in CL. Our framework exploits an implicit bias of gradient-based neural networks due to which these converge to margin maximization points. Such convergence points allow us to reconstruct old data from previous tasks, which we then combine with the current training data. Our framework is flexible and can be applied on top of existing, state-of-the-art CL methods. We further demonstrate the performance gain from our framework across a large series of experiments, including two challenging CL scenarios (class incremental and domain incremental learning), different datasets (MNIST, CIFAR10, TinyImagenet), and different network architectures. Across all experiments, we find large performance gains through ReCL. To the best of our knowledge, our framework is the first to address catastrophic forgetting by leveraging models in CL as their own memory buffers.

Paper Structure

This paper contains 37 sections, 12 equations, 17 figures, 16 tables.

Figures (17)

  • Figure 1: Overview of our ReCL framework. Upon the arrival of a new task $\tau$, we reconstruct samples from all previous tasks from the network $\Phi_{\tau-1}$ by minimizing \ref{['eq:loss_full']}. The reconstructed samples are then combined with the current task's data. Finally, the model is trained on the combined dataset, yielding the new network $\Phi_{\tau}$, from which data will be reconstructed in $\tau+1$.
  • Figure 2: ACC[$\uparrow$] for CIL, SplitMNIST: All methods benefit from our ReCL. Used standalone, ReCL already is competitive to CL methods.
  • Figure 3: BWT[$\uparrow$] for CIL, SplitMNIST: ReCL strongly reduces forgetting for all methods and is competitive when used standalone.
  • Figure 4: Sensitivity to the number of reconstruction samples (scenario: CIL, SplitMNIST). The performance gain increases for larger $m$, but already $m=10$ outperforms Finetune.
  • Figure 5: ACC[$\uparrow$] for DIL, SplitMNIST: All methods benefit from our ReCL.
  • ...and 12 more figures