Towards Robust Continual Learning with Bayesian Adaptive Moment Regularization

Jack Foster; Alexandra Brintrup

Towards Robust Continual Learning with Bayesian Adaptive Moment Regularization

Jack Foster, Alexandra Brintrup

TL;DR

This paper tackles catastrophic forgetting in continual learning by proposing Bayesian Adaptive Moment Regularization (BAdam), a prior-based method that blends Bayesian online variational inference with Adam-like per-parameter moment updates. By maintaining a Gaussian posterior and incorporating adaptive momentum, BAdam provides faster convergence, tighter parameter growth control, and calibrated uncertainty without requiring task labels or discrete task boundaries. Empirical results on CIFAR10 and standard class-incremental benchmarks (including single-headed splits and graduated, label-free settings) show BAdam achieving state-of-the-art performance among prior-based methods, substantially outperforming baselines like BGD, MAS, EWC, SI, and VCL. The work demonstrates that carefully designed prior-based regularization can rival memory-based approaches in online, constrained environments and points to future improvements in convergence speed and few-shot adaptability for real-world deployment.

Abstract

The pursuit of long-term autonomy mandates that machine learning models must continuously adapt to their changing environments and learn to solve new tasks. Continual learning seeks to overcome the challenge of catastrophic forgetting, where learning to solve new tasks causes a model to forget previously learnt information. Prior-based continual learning methods are appealing as they are computationally efficient and do not require auxiliary models or data storage. However, prior-based approaches typically fail on important benchmarks and are thus limited in their potential applications compared to their memory-based counterparts. We introduce Bayesian adaptive moment regularization (BAdam), a novel prior-based method that better constrains parameter growth, reducing catastrophic forgetting. Our method boasts a range of desirable properties such as being lightweight and task label-free, converging quickly, and offering calibrated uncertainty that is important for safe real-world deployment. Results show that BAdam achieves state-of-the-art performance for prior-based methods on challenging single-headed class-incremental experiments such as Split MNIST and Split FashionMNIST, and does so without relying on task labels or discrete task boundaries.

Towards Robust Continual Learning with Bayesian Adaptive Moment Regularization

TL;DR

Abstract

Paper Structure (18 sections, 17 equations, 5 figures, 6 tables)

This paper contains 18 sections, 17 equations, 5 figures, 6 tables.

Introduction
Related Work
Methods
Preliminaries
Problem Definition
Bayesian Gradient Descent
The BAdam Optimizer
Experiments
Cifar10 Convergence Experiment
Standard Benchmark Experiments
Graduated Experiments
Results
Cifar10 Convergence Analysis
Standard Benchmark Experiments
Graduated Experiments
...and 3 more sections

Figures (5)

Figure 1: Left: Change in average $\mu$ magnitude during training. Catastrophic forgetting can be observed in BGD's parameters, while this is less present for BAdam. Right: Distribution of $\sigma$ values at the end of training. BAdam has a wider distribution of uncertainties, indicating that it more effectively constrains highly important parameters while making unused parameters more readily available for learning.
Figure 2: Probability of a sample being taken from each task every batch for graduated SplitMNIST
Figure 3: Optimizer convergence rate comparison on single-task CIFAR10. Results show that despite being Bayesian, BAdam can rival the convergence rates of traditional optimizers.
Figure 4: CL method performance on class-incremental split MNIST
Figure 5: CL method performance on class-incremental split MNIST with graduated boundaries, single epoch training, no task labels

Towards Robust Continual Learning with Bayesian Adaptive Moment Regularization

TL;DR

Abstract

Towards Robust Continual Learning with Bayesian Adaptive Moment Regularization

Authors

TL;DR

Abstract

Table of Contents

Figures (5)