Maintaining Plasticity in Continual Learning via Regenerative Regularization

Saurabh Kumar; Henrik Marklund; Benjamin Van Roy

Maintaining Plasticity in Continual Learning via Regenerative Regularization

Saurabh Kumar, Henrik Marklund, Benjamin Van Roy

TL;DR

Plasticity loss in continual learning impedes rapid adaptation to new tasks under non-stationary data. The authors propose L2 Init, a regenerative regularization toward initial parameters, integrated into the loss as $L_reg(\theta) = L_{train}(\theta) + lambda * ||\theta - \theta_0||^2$, which is simple to implement and requires a single hyper-parameter. Empirical results across five continual supervised learning benchmarks show L2 Init consistently preserves plasticity, often matching or exceeding resetting and architectural baselines, while maintaining higher feature rank than standard L2 regularization. Ablation studies reveal emphasizing the fixed initial parameters is key, and the approach remains robust to wider networks and different initialization schemes. The work suggests a practical, low-complexity option for sustaining adaptability in non-stationary settings and motivates extensions to RL and forgetting-plasticity trade-offs.

Abstract

In continual learning, plasticity refers to the ability of an agent to quickly adapt to new information. Neural networks are known to lose plasticity when processing non-stationary data streams. In this paper, we propose L2 Init, a simple approach for maintaining plasticity by incorporating in the loss function L2 regularization toward initial parameters. This is very similar to standard L2 regularization (L2), the only difference being that L2 regularizes toward the origin. L2 Init is simple to implement and requires selecting only a single hyper-parameter. The motivation for this method is the same as that of methods that reset neurons or parameter values. Intuitively, when recent losses are insensitive to particular parameters, these parameters should drift toward their initial values. This prepares parameters to adapt quickly to new tasks. On problems representative of different types of nonstationarity in continual supervised learning, we demonstrate that L2 Init most consistently mitigates plasticity loss compared to previously proposed approaches.

Maintaining Plasticity in Continual Learning via Regenerative Regularization

TL;DR

, which is simple to implement and requires a single hyper-parameter. Empirical results across five continual supervised learning benchmarks show L2 Init consistently preserves plasticity, often matching or exceeding resetting and architectural baselines, while maintaining higher feature rank than standard L2 regularization. Ablation studies reveal emphasizing the fixed initial parameters is key, and the approach remains robust to wider networks and different initialization schemes. The work suggests a practical, low-complexity option for sustaining adaptability in non-stationary settings and motivates extensions to RL and forgetting-plasticity trade-offs.

Abstract

Paper Structure (29 sections, 10 equations, 18 figures, 4 tables)

This paper contains 29 sections, 10 equations, 18 figures, 4 tables.

Introduction
Related Work
Problem Settings
Causes of plasticity loss
Mitigating plasticity loss
Regenerative Regularization
Continual Supervised Learning
Evaluation Protocol
Problems
Experiments
Comparative Evaluation
Looking inside the network
Ablation Study of Regenerative Regularization
Robustness to Network Width
Conclusion
...and 14 more sections

Figures (18)

Figure 1: Comparison of average online task accuracy across all five problems when using the Adam optimizer. L$2$ Init consistently maintains plasticity. While L$2$ mitigates plasticity loss completely on Permuted MNIST and Continual ImageNet, this method performs poorly on Random Label MNIST, Random Label CIFAR, and 5+1 CIFAR. Concat ReLU generally performs very well, except on 5+1 CIFAR where it suffers a sharp drop in performance.
Figure 3: Average weight magnitude and feature rank over time when training all agents using Adam. L$2$ Init retains a relatively small average weight magnitude and high feature rank.
Figure 4: Comparison of L2 Init, L2 Init + Resample, and L1 Init on three problems when using Adam. L2 Init + Resample performs poorly on all environments, especially on Random Label MNIST and 5+1 CIFAR where it loses plasticity. L1 Init matches the performance of L2 Init on Random Label MNIST and performs slightly worse on Permuted MNIST and 5+1 CIFAR.
Figure 5: Comparison of average online task accuracy on a subset of problems when using a wider network with the Adam optimizer. L2 Init consistently mitigates plasticity loss.
Figure 6: Comparison of average online task accuracy across all five problems when using Vanilla SGD. L$2$ Init consistently maintains plasticity, whereas L2 does not on Permuted MNIST and Random Label MNIST.
...and 13 more figures

Maintaining Plasticity in Continual Learning via Regenerative Regularization

TL;DR

Abstract

Maintaining Plasticity in Continual Learning via Regenerative Regularization

Authors

TL;DR

Abstract

Table of Contents

Figures (18)