Bayesian continual learning and forgetting in neural networks
Djohan Bonnet, Kellian Cottart, Tifenn Hirtzlin, Tarcisius Januel, Thomas Dalgaty, Elisa Vianello, Damien Querlioz
TL;DR
MESU introduces Metaplasticity from Synaptic Uncertainty, a Bayesian continual-learning framework that updates neural weights according to their uncertainty to balance learning and forgetting. By maintaining a truncated posterior over the last $N$ tasks and minimizing a variational free-energy $\mathcal{F}_t$, MESU achieves principled forgetting, preserves essential past knowledge, and scales updates by weight uncertainty through the rule $\Delta\bm{\mu}$ and $\Delta\bm{\sigma}$. The paper shows theoretical links to Hessian-based regularization and to Newton’s method, and demonstrates strong empirical performance on domain-incremental animals, Permuted MNIST, and CIFAR-10/100, outperforming boundary-based methods and avoiding both catastrophic forgetting and catastrophic remembering while retaining robust epistemic uncertainty for out-of-distribution detection. The work provides a biologically inspired, boundary-free path toward robust perpetual learning in streaming data.
Abstract
Biological synapses effortlessly balance memory retention and flexibility, yet artificial neural networks still struggle with the extremes of catastrophic forgetting and catastrophic remembering. Here, we introduce Metaplasticity from Synaptic Uncertainty (MESU), a Bayesian framework that updates network parameters according their uncertainty. This approach allows a principled combination of learning and forgetting that ensures that critical knowledge is preserved while unused or outdated information is gradually released. Unlike standard Bayesian approaches -- which risk becoming overly constrained, and popular continual-learning methods that rely on explicit task boundaries, MESU seamlessly adapts to streaming data. It further provides reliable epistemic uncertainty estimates, allowing out-of-distribution detection, the only computational cost being to sample the weights multiple times to provide proper output statistics. Experiments on image-classification benchmarks demonstrate that MESU mitigates catastrophic forgetting, while maintaining plasticity for new tasks. When training 200 sequential permuted MNIST tasks, MESU outperforms established continual learning techniques in terms of accuracy, capability to learn additional tasks, and out-of-distribution data detection. Additionally, due to its non-reliance on task boundaries, MESU outperforms conventional learning techniques on the incremental training of CIFAR-100 tasks consistently in a wide range of scenarios. Our results unify ideas from metaplasticity, Bayesian inference, and Hessian-based regularization, offering a biologically-inspired pathway to robust, perpetual learning.
