Continual Learning With Quasi-Newton Methods
Steven Vander Eeckt, Hugo Van hamme
TL;DR
This work addresses catastrophic forgetting in sequential task learning by extending EWC with Sampled Quasi-Newton Hessian approximations, moving beyond the diagonal Fisher Information Matrix to capture richer parameter interactions. CSQN integrates SQN-based Hessian updates with the EWC framework and introduces memory-reduction variants (CT, BTREE, MRT) to scale to many tasks while preserving performance. Across Rotated MNIST, Split CIFAR-10/100, Split TinyImageNet, and Vision Datasets, CSQN consistently outperforms EWC and many baselines, reducing forgetting by about 50% on average and improving overall accuracy by roughly 8%, though KF remains a strong competitor in some tasks. The methods are architecture-agnostic and straightforward to implement, making CSQN a robust, scalable solution for continual learning with meaningful practical impact and clear directions for reducing memory overhead further.
Abstract
Catastrophic forgetting remains a major challenge when neural networks learn tasks sequentially. Elastic Weight Consolidation (EWC) attempts to address this problem by introducing a Bayesian-inspired regularization loss to preserve knowledge of previously learned tasks. However, EWC relies on a Laplace approximation where the Hessian is simplified to the diagonal of the Fisher information matrix, assuming uncorrelated model parameters. This overly simplistic assumption often leads to poor Hessian estimates, limiting its effectiveness. To overcome this limitation, we introduce Continual Learning with Sampled Quasi-Newton (CSQN), which leverages Quasi-Newton methods to compute more accurate Hessian approximations. CSQN captures parameter interactions beyond the diagonal without requiring architecture-specific modifications, making it applicable across diverse tasks and architectures. Experimental results across four benchmarks demonstrate that CSQN consistently outperforms EWC and other state-of-the-art baselines, including rehearsal-based methods. CSQN reduces EWC's forgetting by 50 percent and improves its performance by 8 percent on average. Notably, CSQN achieves superior results on three out of four benchmarks, including the most challenging scenarios, highlighting its potential as a robust solution for continual learning.
