Table of Contents
Fetching ...

Task Agnostic Continual Learning Using Online Variational Bayes

Chen Zeno, Itay Golan, Elad Hoffer, Daniel Soudry

TL;DR

This work tackles catastrophic forgetting in scenarios where task boundaries are unknown by introducing Bayesian Gradient Descent (BGD), an online variational Bayes method with a diagonal Gaussian weight posterior. BGD updates the weight distribution in a closed-form manner, tying learning rates to parameter uncertainty and leveraging sequential posterior updates without memory of past tasks. The authors provide a formal taxonomy for continual-learning scenarios, introduce the labels trick to improve class-learning, and demonstrate competitive performance across continuous and discrete task-agnostic settings on MNIST and CIFAR datasets. The approach offers a practical, scalable Bayesian framework for task-agnostic continual learning and highlights avenues for extending posterior structure beyond diagonal Gaussians.

Abstract

Catastrophic forgetting is the notorious vulnerability of neural networks to the change of the data distribution while learning. This phenomenon has long been considered a major obstacle for allowing the use of learning agents in realistic continual learning settings. A large body of continual learning research assumes that task boundaries are known during training. However, research for scenarios in which task boundaries are unknown during training has been lacking. In this paper we present, for the first time, a method for preventing catastrophic forgetting (BGD) for scenarios with task boundaries that are unknown during training --- task-agnostic continual learning. Code of our algorithm is available at https://github.com/igolan/bgd.

Task Agnostic Continual Learning Using Online Variational Bayes

TL;DR

This work tackles catastrophic forgetting in scenarios where task boundaries are unknown by introducing Bayesian Gradient Descent (BGD), an online variational Bayes method with a diagonal Gaussian weight posterior. BGD updates the weight distribution in a closed-form manner, tying learning rates to parameter uncertainty and leveraging sequential posterior updates without memory of past tasks. The authors provide a formal taxonomy for continual-learning scenarios, introduce the labels trick to improve class-learning, and demonstrate competitive performance across continuous and discrete task-agnostic settings on MNIST and CIFAR datasets. The approach offers a practical, scalable Bayesian framework for task-agnostic continual learning and highlights avenues for extending posterior structure beyond diagonal Gaussians.

Abstract

Catastrophic forgetting is the notorious vulnerability of neural networks to the change of the data distribution while learning. This phenomenon has long been considered a major obstacle for allowing the use of learning agents in realistic continual learning settings. A large body of continual learning research assumes that task boundaries are known during training. However, research for scenarios in which task boundaries are unknown during training has been lacking. In this paper we present, for the first time, a method for preventing catastrophic forgetting (BGD) for scenarios with task boundaries that are unknown during training --- task-agnostic continual learning. Code of our algorithm is available at https://github.com/igolan/bgd.

Paper Structure

This paper contains 47 sections, 2 theorems, 31 equations, 14 figures, 7 tables, 1 algorithm.

Key Result

Theorem 1

We examine BGD with a diagonal Gaussian distribution for $\boldsymbol{\theta}$. If ${\mathrm{L}_n}\left(\boldsymbol{\theta}\right)$ is a strongly convex function with parameter $m_{n}>0$ and a continuously differentiable function over $\mathbb{R}^{n}$, then $\mathbf{\mathbb{E}}_{\varepsilon}\left[\f

Figures (14)

  • Figure 1: Continual learning scenario characterization. Each scenario is fundamentally different. This paper presents first results on task-agnostic scenarios. Previous results are average accuracy over all tasks as reported in hsu2018re.
  • Figure 2: Distribution of samples from each task as a function of iteration. The tasks are not changed abruptly, but slowly over time --- tasks boundaries are undefined. Moreover, the algorithm has no access to this distribution. Here, number of samples from each task in each batch is a random variable drawn from a distribution over tasks, and this distribution changes over time (iterations).
  • Figure 3: Results on Continuous Permuted MNIST. The scenario is continuous task agnostic continual learning: tasks are changing slowly over time as showed in Figure \ref{['fig:tasks_distribution']}. Reported accuracy is the average over three different runs, with error bars for STD.
  • Figure 4: The average test accuracy on permuted MNIST vs. the number of tasks. BGD (red), VCL (black) and SI (green). We used mini-batch of size 128 and 300 epochs for all the algorithms. Note that, in contrast to VCL and SI, BGD is task-agnostic (i.e. unaware of tasks changing), while still significantly alleviates catastrophic forgetting.
  • Figure 5: The histogram of STD values at the end of the training process of each task, and the initial STD value is 0.06.
  • ...and 9 more figures

Theorems & Definitions (3)

  • Theorem 1
  • Corollary 1
  • proof