Table of Contents
Fetching ...

Forget Less, Retain More: A Lightweight Regularizer for Rehearsal-Based Continual Learning

Lama Alssum, Hasan Abed Al Kader Hammoud, Motasem Alfarra, Juan C Leon Alcazar, Bernard Ghanem

TL;DR

Catastrophic forgetting in continual learning is exacerbated in memory-intensive rehearsal methods. The authors propose Information Maximization (IM), a lightweight, class-agnostic regularizer that operates on prediction distributions and can be wrapped around any rehearsal-based CL method to improve retention without substantial overhead. They systematically compare IM against EWC, SI, and EM, and show robust improvements across image benchmarks (Split-CIFAR100, Split-Tiny ImageNet) and extend the evaluation to video continual learning, including ablations on compute budget, task count, and regularization targets. The results demonstrate consistent accuracy gains and reduced forgetting, highlighting IM's practical value for memory-constrained, real-world CL scenarios in both image and video domains.

Abstract

Deep neural networks suffer from catastrophic forgetting, where performance on previous tasks degrades after training on a new task. This issue arises due to the model's tendency to overwrite previously acquired knowledge with new information. We present a novel approach to address this challenge, focusing on the intersection of memory-based methods and regularization approaches. We formulate a regularization strategy, termed Information Maximization (IM) regularizer, for memory-based continual learning methods, which is based exclusively on the expected label distribution, thus making it class-agnostic. As a consequence, IM regularizer can be directly integrated into various rehearsal-based continual learning methods, reducing forgetting and favoring faster convergence. Our empirical validation shows that, across datasets and regardless of the number of tasks, our proposed regularization strategy consistently improves baseline performance at the expense of a minimal computational overhead. The lightweight nature of IM ensures that it remains a practical and scalable solution, making it applicable to real-world continual learning scenarios where efficiency is paramount. Finally, we demonstrate the data-agnostic nature of our regularizer by applying it to video data, which presents additional challenges due to its temporal structure and higher memory requirements. Despite the significant domain gap, our experiments show that IM regularizer also improves the performance of video continual learning methods.

Forget Less, Retain More: A Lightweight Regularizer for Rehearsal-Based Continual Learning

TL;DR

Catastrophic forgetting in continual learning is exacerbated in memory-intensive rehearsal methods. The authors propose Information Maximization (IM), a lightweight, class-agnostic regularizer that operates on prediction distributions and can be wrapped around any rehearsal-based CL method to improve retention without substantial overhead. They systematically compare IM against EWC, SI, and EM, and show robust improvements across image benchmarks (Split-CIFAR100, Split-Tiny ImageNet) and extend the evaluation to video continual learning, including ablations on compute budget, task count, and regularization targets. The results demonstrate consistent accuracy gains and reduced forgetting, highlighting IM's practical value for memory-constrained, real-world CL scenarios in both image and video domains.

Abstract

Deep neural networks suffer from catastrophic forgetting, where performance on previous tasks degrades after training on a new task. This issue arises due to the model's tendency to overwrite previously acquired knowledge with new information. We present a novel approach to address this challenge, focusing on the intersection of memory-based methods and regularization approaches. We formulate a regularization strategy, termed Information Maximization (IM) regularizer, for memory-based continual learning methods, which is based exclusively on the expected label distribution, thus making it class-agnostic. As a consequence, IM regularizer can be directly integrated into various rehearsal-based continual learning methods, reducing forgetting and favoring faster convergence. Our empirical validation shows that, across datasets and regardless of the number of tasks, our proposed regularization strategy consistently improves baseline performance at the expense of a minimal computational overhead. The lightweight nature of IM ensures that it remains a practical and scalable solution, making it applicable to real-world continual learning scenarios where efficiency is paramount. Finally, we demonstrate the data-agnostic nature of our regularizer by applying it to video data, which presents additional challenges due to its temporal structure and higher memory requirements. Despite the significant domain gap, our experiments show that IM regularizer also improves the performance of video continual learning methods.

Paper Structure

This paper contains 27 sections, 7 equations, 3 figures, 9 tables.

Figures (3)

  • Figure 1: Results of Integrating Different Regularizers on Split-CIFAR100 and Split-Tiny ImageNet. This figure plots the average accuracy and forgetting rate of three baseline methods (ER, DER, and DER++) across various sizes of memory buffer, in combination with the analyzed regularizers (IM, EM, EW, and SI), and across two datasets (Split-CIFAR100 and Split-Tiny ImageNet). The results demonstrate that the proposed information maximization regularizer (IM) consistently outperforms other methods, achieving higher accuracy and lower forgetting rates on both datasets regardless of the memory setting.
  • Figure 2: Results on Split-CIFAR100 and Split-Tiny ImageNet. This figure presents the average accuracy and forgetting rate of five rehearsal-based methods (ER, DER, DER++, Refresh Learning, and STAR) across different memory buffer sizes, both in their baseline form and when combined with Information Maximization (IM). The results show that integrating IM consistently enhances performance, leading to higher accuracy and reduced forgetting across all methods, datasets, and memory settings.
  • Figure 3: Application of Information Maximization to Video Continual Learning. This figure illustrates the average accuracy and forgetting rates of the iCARL video continual learning variant, introduced by vCLIMB vclimb, with and without our Information Maximization (IM) regularizer. The results demonstrate that incorporating the IM regularizer on top of iCARL leads to consistent improvements in average accuracy and reductions in the forgetting rate.