Slowing Down Forgetting in Continual Learning
Pascal Janetzky, Tobias Schlagenhauf, Stefan Feuerriegel
TL;DR
The paper addresses catastrophic forgetting in continual learning by introducing ReCL, a memory-free framework that leverages the implicit bias of gradient-based optimization toward margin-maximization points to reconstruct past task data from the current model. It then merges these reconstructed samples with the current task data to slow forgetting, and is designed to be compatible with existing CL methods. Across class- and domain-incremental scenarios on MNIST, CIFAR-10, and TinyImageNet, ReCL yields consistent improvements in ACC while reducing forgetting (BWT), including gains on non-homogeneous architectures. The approach offers a practical, flexible memory-buffer alternative that scales to large datasets and real-world streaming settings, with strong empirical support and theoretical grounding in dataset reconstruction from trained models.
Abstract
A common challenge in continual learning (CL) is catastrophic forgetting, where the performance on old tasks drops after new, additional tasks are learned. In this paper, we propose a novel framework called ReCL to slow down forgetting in CL. Our framework exploits an implicit bias of gradient-based neural networks due to which these converge to margin maximization points. Such convergence points allow us to reconstruct old data from previous tasks, which we then combine with the current training data. Our framework is flexible and can be applied on top of existing, state-of-the-art CL methods. We further demonstrate the performance gain from our framework across a large series of experiments, including two challenging CL scenarios (class incremental and domain incremental learning), different datasets (MNIST, CIFAR10, TinyImagenet), and different network architectures. Across all experiments, we find large performance gains through ReCL. To the best of our knowledge, our framework is the first to address catastrophic forgetting by leveraging models in CL as their own memory buffers.
