Table of Contents
Fetching ...

Towards Robust Continual Learning with Bayesian Adaptive Moment Regularization

Jack Foster, Alexandra Brintrup

TL;DR

This paper tackles catastrophic forgetting in continual learning by proposing Bayesian Adaptive Moment Regularization (BAdam), a prior-based method that blends Bayesian online variational inference with Adam-like per-parameter moment updates. By maintaining a Gaussian posterior and incorporating adaptive momentum, BAdam provides faster convergence, tighter parameter growth control, and calibrated uncertainty without requiring task labels or discrete task boundaries. Empirical results on CIFAR10 and standard class-incremental benchmarks (including single-headed splits and graduated, label-free settings) show BAdam achieving state-of-the-art performance among prior-based methods, substantially outperforming baselines like BGD, MAS, EWC, SI, and VCL. The work demonstrates that carefully designed prior-based regularization can rival memory-based approaches in online, constrained environments and points to future improvements in convergence speed and few-shot adaptability for real-world deployment.

Abstract

The pursuit of long-term autonomy mandates that machine learning models must continuously adapt to their changing environments and learn to solve new tasks. Continual learning seeks to overcome the challenge of catastrophic forgetting, where learning to solve new tasks causes a model to forget previously learnt information. Prior-based continual learning methods are appealing as they are computationally efficient and do not require auxiliary models or data storage. However, prior-based approaches typically fail on important benchmarks and are thus limited in their potential applications compared to their memory-based counterparts. We introduce Bayesian adaptive moment regularization (BAdam), a novel prior-based method that better constrains parameter growth, reducing catastrophic forgetting. Our method boasts a range of desirable properties such as being lightweight and task label-free, converging quickly, and offering calibrated uncertainty that is important for safe real-world deployment. Results show that BAdam achieves state-of-the-art performance for prior-based methods on challenging single-headed class-incremental experiments such as Split MNIST and Split FashionMNIST, and does so without relying on task labels or discrete task boundaries.

Towards Robust Continual Learning with Bayesian Adaptive Moment Regularization

TL;DR

This paper tackles catastrophic forgetting in continual learning by proposing Bayesian Adaptive Moment Regularization (BAdam), a prior-based method that blends Bayesian online variational inference with Adam-like per-parameter moment updates. By maintaining a Gaussian posterior and incorporating adaptive momentum, BAdam provides faster convergence, tighter parameter growth control, and calibrated uncertainty without requiring task labels or discrete task boundaries. Empirical results on CIFAR10 and standard class-incremental benchmarks (including single-headed splits and graduated, label-free settings) show BAdam achieving state-of-the-art performance among prior-based methods, substantially outperforming baselines like BGD, MAS, EWC, SI, and VCL. The work demonstrates that carefully designed prior-based regularization can rival memory-based approaches in online, constrained environments and points to future improvements in convergence speed and few-shot adaptability for real-world deployment.

Abstract

The pursuit of long-term autonomy mandates that machine learning models must continuously adapt to their changing environments and learn to solve new tasks. Continual learning seeks to overcome the challenge of catastrophic forgetting, where learning to solve new tasks causes a model to forget previously learnt information. Prior-based continual learning methods are appealing as they are computationally efficient and do not require auxiliary models or data storage. However, prior-based approaches typically fail on important benchmarks and are thus limited in their potential applications compared to their memory-based counterparts. We introduce Bayesian adaptive moment regularization (BAdam), a novel prior-based method that better constrains parameter growth, reducing catastrophic forgetting. Our method boasts a range of desirable properties such as being lightweight and task label-free, converging quickly, and offering calibrated uncertainty that is important for safe real-world deployment. Results show that BAdam achieves state-of-the-art performance for prior-based methods on challenging single-headed class-incremental experiments such as Split MNIST and Split FashionMNIST, and does so without relying on task labels or discrete task boundaries.
Paper Structure (18 sections, 17 equations, 5 figures, 6 tables)

This paper contains 18 sections, 17 equations, 5 figures, 6 tables.

Figures (5)

  • Figure 1: Left: Change in average $\mu$ magnitude during training. Catastrophic forgetting can be observed in BGD's parameters, while this is less present for BAdam. Right: Distribution of $\sigma$ values at the end of training. BAdam has a wider distribution of uncertainties, indicating that it more effectively constrains highly important parameters while making unused parameters more readily available for learning.
  • Figure 2: Probability of a sample being taken from each task every batch for graduated SplitMNIST
  • Figure 3: Optimizer convergence rate comparison on single-task CIFAR10. Results show that despite being Bayesian, BAdam can rival the convergence rates of traditional optimizers.
  • Figure 4: CL method performance on class-incremental split MNIST
  • Figure 5: CL method performance on class-incremental split MNIST with graduated boundaries, single epoch training, no task labels