FSC-Net: Fast-Slow Consolidation Networks for Continual Learning
Mohamed El Gorrim
TL;DR
FSC-Net tackles catastrophic forgetting in continual learning by separating rapid task learning from slow knowledge consolidation through a dual-network design. The fast NN1 rapidly adapts to new tasks, while the slow NN2 consolidates knowledge via replay and periodic distillation, with pure replay during consolidation found to be most effective. Empirical results on Split-MNIST show a substantial retention boost (NN2: 91.71% ± 0.62% vs NN1: 87.43% ± 1.27%), and CIFAR-10 demonstrates a meaningful but relative improvement (+8.20pp) despite modest absolute performance due to the simple MLP backbone. The findings underscore that consolidation efficacy stems from the training protocol and replay-based rehearsal rather than architectural complexity, and they reveal practical considerations for applying dual-timescale consolidation to broader backbones and task sequences.
Abstract
Continual learning remains challenging due to catastrophic forgetting, where neural networks lose previously acquired knowledge when learning new tasks. Inspired by memory consolidation in neuroscience, we propose FSC-Net (Fast-Slow Consolidation Networks), a dual-network architecture that separates rapid task learning from gradual knowledge consolidation. Our method employs a fast network (NN1) for immediate adaptation to new tasks and a slow network (NN2) that consolidates knowledge through distillation and replay. Within the family of MLP-based NN1 variants we evaluated, consolidation effectiveness is driven more by methodology than architectural embellishments -- a simple MLP outperforms more complex similarity-gated variants by 1.2pp. Through systematic hyperparameter analysis, we observed empirically that pure replay without distillation during consolidation achieves superior performance, consistent with the hypothesis that distillation from the fast network introduces recency bias. On Split-MNIST (30 seeds), FSC-Net achieves 91.71% +/- 0.62% retention accuracy, a +4.27pp gain over the fast network alone (87.43% +/- 1.27%, paired t=23.585, p < 1e-10). On Split-CIFAR-10 (5 seeds), our method achieves 33.31% +/- 0.38% retention with an +8.20pp gain over the fast network alone (25.11% +/- 1.61%, paired t=9.75, p < 1e-3), demonstrating +8.20pp gain, though absolute performance (33.31%) remains modest and below random expectation, highlighting need for stronger backbones. Our results provide empirical evidence that the dual-timescale consolidation mechanism, rather than architectural complexity, is central to mitigating catastrophic forgetting in this setting.
