Continual Learning Beyond Experience Rehearsal and Full Model Surrogates
Prashant Bhat, Laurens Niesten, Elahe Arani, Bahram Zonooz
TL;DR
This work tackles catastrophic forgetting in continual learning by proposing SPARC, a rehearsal-free, parameter-efficient framework that replaces memory buffers and full-model surrogates with per-task working memories and a shared semantic memory. Using depth-wise separable convolutions for task-specific components and partial cross-task sharing for semantic memory, SPARC achieves strong performance on Seq-TinyImageNet and competitive results on other CL benchmarks while requiring only about 6% of the parameters of full surrogates. A simple weight re-normalization in the classification layer mitigates task biases and promotes balanced performance across tasks. The approach offers practical scalability for memory-constrained settings and opens avenues for extending parameter-efficient SPARC-style architectures to other backbones and domains, including vision transformers.
Abstract
Continual learning (CL) has remained a significant challenge for deep neural networks as learning new tasks erases previously acquired knowledge, either partially or completely. Existing solutions often rely on experience rehearsal or full model surrogates to mitigate CF. While effective, these approaches introduce substantial memory and computational overhead, limiting their scalability and applicability in real-world scenarios. To address this, we propose SPARC, a scalable CL approach that eliminates the need for experience rehearsal and full-model surrogates. By effectively combining task-specific working memories and task-agnostic semantic memory for cross-task knowledge consolidation, SPARC results in a remarkable parameter efficiency, using only 6% of the parameters required by full-model surrogates. Despite its lightweight design, SPARC achieves superior performance on Seq-TinyImageNet and matches rehearsal-based methods on various CL benchmarks. Additionally, weight re-normalization in the classification layer mitigates task-specific biases, establishing SPARC as a practical and scalable solution for CL under stringent efficiency constraints.
