Towards Robust Continual Learning with Bayesian Adaptive Moment Regularization
Jack Foster, Alexandra Brintrup
TL;DR
This paper tackles catastrophic forgetting in continual learning by proposing Bayesian Adaptive Moment Regularization (BAdam), a prior-based method that blends Bayesian online variational inference with Adam-like per-parameter moment updates. By maintaining a Gaussian posterior and incorporating adaptive momentum, BAdam provides faster convergence, tighter parameter growth control, and calibrated uncertainty without requiring task labels or discrete task boundaries. Empirical results on CIFAR10 and standard class-incremental benchmarks (including single-headed splits and graduated, label-free settings) show BAdam achieving state-of-the-art performance among prior-based methods, substantially outperforming baselines like BGD, MAS, EWC, SI, and VCL. The work demonstrates that carefully designed prior-based regularization can rival memory-based approaches in online, constrained environments and points to future improvements in convergence speed and few-shot adaptability for real-world deployment.
Abstract
The pursuit of long-term autonomy mandates that machine learning models must continuously adapt to their changing environments and learn to solve new tasks. Continual learning seeks to overcome the challenge of catastrophic forgetting, where learning to solve new tasks causes a model to forget previously learnt information. Prior-based continual learning methods are appealing as they are computationally efficient and do not require auxiliary models or data storage. However, prior-based approaches typically fail on important benchmarks and are thus limited in their potential applications compared to their memory-based counterparts. We introduce Bayesian adaptive moment regularization (BAdam), a novel prior-based method that better constrains parameter growth, reducing catastrophic forgetting. Our method boasts a range of desirable properties such as being lightweight and task label-free, converging quickly, and offering calibrated uncertainty that is important for safe real-world deployment. Results show that BAdam achieves state-of-the-art performance for prior-based methods on challenging single-headed class-incremental experiments such as Split MNIST and Split FashionMNIST, and does so without relying on task labels or discrete task boundaries.
