Online Structured Laplace Approximations For Overcoming Catastrophic Forgetting
Hippolyt Ritter, Aleksandar Botev, David Barber
TL;DR
This work tackles catastrophic forgetting in neural networks by formulating Bayesian online learning with a Gaussian posterior update and a Laplace-based local approximation. It leverages a block-diagonal, Kronecker-factored Hessian to capture interdependencies among weights within the same layer while remaining scalable. The proposed Online Laplace method, especially with Kronecker factorization, substantially improves performance over diagonal-based approaches and baselines like EWC and SI on long sequences of tasks, including 50 permuted MNIST datasets and multiple vision benchmarks. The results underscore the importance of modeling weight interactions for robust continual learning and offer a scalable framework for Bayesian online continual learning.
Abstract
We introduce the Kronecker factored online Laplace approximation for overcoming catastrophic forgetting in neural networks. The method is grounded in a Bayesian online learning framework, where we recursively approximate the posterior after every task with a Gaussian, leading to a quadratic penalty on changes to the weights. The Laplace approximation requires calculating the Hessian around a mode, which is typically intractable for modern architectures. In order to make our method scalable, we leverage recent block-diagonal Kronecker factored approximations to the curvature. Our algorithm achieves over 90% test accuracy across a sequence of 50 instantiations of the permuted MNIST dataset, substantially outperforming related methods for overcoming catastrophic forgetting.
