Forget Less, Retain More: A Lightweight Regularizer for Rehearsal-Based Continual Learning
Lama Alssum, Hasan Abed Al Kader Hammoud, Motasem Alfarra, Juan C Leon Alcazar, Bernard Ghanem
TL;DR
Catastrophic forgetting in continual learning is exacerbated in memory-intensive rehearsal methods. The authors propose Information Maximization (IM), a lightweight, class-agnostic regularizer that operates on prediction distributions and can be wrapped around any rehearsal-based CL method to improve retention without substantial overhead. They systematically compare IM against EWC, SI, and EM, and show robust improvements across image benchmarks (Split-CIFAR100, Split-Tiny ImageNet) and extend the evaluation to video continual learning, including ablations on compute budget, task count, and regularization targets. The results demonstrate consistent accuracy gains and reduced forgetting, highlighting IM's practical value for memory-constrained, real-world CL scenarios in both image and video domains.
Abstract
Deep neural networks suffer from catastrophic forgetting, where performance on previous tasks degrades after training on a new task. This issue arises due to the model's tendency to overwrite previously acquired knowledge with new information. We present a novel approach to address this challenge, focusing on the intersection of memory-based methods and regularization approaches. We formulate a regularization strategy, termed Information Maximization (IM) regularizer, for memory-based continual learning methods, which is based exclusively on the expected label distribution, thus making it class-agnostic. As a consequence, IM regularizer can be directly integrated into various rehearsal-based continual learning methods, reducing forgetting and favoring faster convergence. Our empirical validation shows that, across datasets and regardless of the number of tasks, our proposed regularization strategy consistently improves baseline performance at the expense of a minimal computational overhead. The lightweight nature of IM ensures that it remains a practical and scalable solution, making it applicable to real-world continual learning scenarios where efficiency is paramount. Finally, we demonstrate the data-agnostic nature of our regularizer by applying it to video data, which presents additional challenges due to its temporal structure and higher memory requirements. Despite the significant domain gap, our experiments show that IM regularizer also improves the performance of video continual learning methods.
