Table of Contents
Fetching ...

Continual Learning Beyond Experience Rehearsal and Full Model Surrogates

Prashant Bhat, Laurens Niesten, Elahe Arani, Bahram Zonooz

TL;DR

This work tackles catastrophic forgetting in continual learning by proposing SPARC, a rehearsal-free, parameter-efficient framework that replaces memory buffers and full-model surrogates with per-task working memories and a shared semantic memory. Using depth-wise separable convolutions for task-specific components and partial cross-task sharing for semantic memory, SPARC achieves strong performance on Seq-TinyImageNet and competitive results on other CL benchmarks while requiring only about 6% of the parameters of full surrogates. A simple weight re-normalization in the classification layer mitigates task biases and promotes balanced performance across tasks. The approach offers practical scalability for memory-constrained settings and opens avenues for extending parameter-efficient SPARC-style architectures to other backbones and domains, including vision transformers.

Abstract

Continual learning (CL) has remained a significant challenge for deep neural networks as learning new tasks erases previously acquired knowledge, either partially or completely. Existing solutions often rely on experience rehearsal or full model surrogates to mitigate CF. While effective, these approaches introduce substantial memory and computational overhead, limiting their scalability and applicability in real-world scenarios. To address this, we propose SPARC, a scalable CL approach that eliminates the need for experience rehearsal and full-model surrogates. By effectively combining task-specific working memories and task-agnostic semantic memory for cross-task knowledge consolidation, SPARC results in a remarkable parameter efficiency, using only 6% of the parameters required by full-model surrogates. Despite its lightweight design, SPARC achieves superior performance on Seq-TinyImageNet and matches rehearsal-based methods on various CL benchmarks. Additionally, weight re-normalization in the classification layer mitigates task-specific biases, establishing SPARC as a practical and scalable solution for CL under stringent efficiency constraints.

Continual Learning Beyond Experience Rehearsal and Full Model Surrogates

TL;DR

This work tackles catastrophic forgetting in continual learning by proposing SPARC, a rehearsal-free, parameter-efficient framework that replaces memory buffers and full-model surrogates with per-task working memories and a shared semantic memory. Using depth-wise separable convolutions for task-specific components and partial cross-task sharing for semantic memory, SPARC achieves strong performance on Seq-TinyImageNet and competitive results on other CL benchmarks while requiring only about 6% of the parameters of full surrogates. A simple weight re-normalization in the classification layer mitigates task biases and promotes balanced performance across tasks. The approach offers practical scalability for memory-constrained settings and opens avenues for extending parameter-efficient SPARC-style architectures to other backbones and domains, including vision transformers.

Abstract

Continual learning (CL) has remained a significant challenge for deep neural networks as learning new tasks erases previously acquired knowledge, either partially or completely. Existing solutions often rely on experience rehearsal or full model surrogates to mitigate CF. While effective, these approaches introduce substantial memory and computational overhead, limiting their scalability and applicability in real-world scenarios. To address this, we propose SPARC, a scalable CL approach that eliminates the need for experience rehearsal and full-model surrogates. By effectively combining task-specific working memories and task-agnostic semantic memory for cross-task knowledge consolidation, SPARC results in a remarkable parameter efficiency, using only 6% of the parameters required by full-model surrogates. Despite its lightweight design, SPARC achieves superior performance on Seq-TinyImageNet and matches rehearsal-based methods on various CL benchmarks. Additionally, weight re-normalization in the classification layer mitigates task-specific biases, establishing SPARC as a practical and scalable solution for CL under stringent efficiency constraints.

Paper Structure

This paper contains 35 sections, 7 equations, 9 figures, 10 tables.

Figures (9)

  • Figure 1: The SPARC architecture, using ResNet-18 of 4 layers with 2 blocks each. Task-specific working memories (shown in white) efficiently capture task-relevant information, while the task-agnostic semantic memory (highlighted in red) consolidates knowledge across tasks. This design enables SPARC to effectively balance plasticity and stability, achieving scalable continual learning without the need for full model surrogates or experience rehearsal.
  • Figure 2: Comparison with parameter isolation approaches on Seq-CIFAR100 with 20 tasks. We report the final accuracy of each task after training on all tasks.
  • Figure 3: Comparison of relative performance and model size of different CL approaches in Seq-TinyImageNet 10 tasks with respect to a JOINT model in Class-IL (left) and Task-IL (right) settings. Compared Task-IL methods include ALASSO park2019continual, UCB Ebrahimi2020Uncertainty, oEWC kirkpatrick2017overcoming, SI zenke2017continual, BMKP sun2023decoupling, and PackNet mallya2018packnet
  • Figure 4: Effect of semantic information consolidation on SPARC's stability. Stability represents model stability at task $t$, quantified as the average performance across all preceding tasks.
  • Figure 5: Depiction of training and inference regimes in SPARC in Class-IL setting.
  • ...and 4 more figures