Table of Contents
Fetching ...

Variational Continual Learning

Cuong V. Nguyen, Yingzhen Li, Thang D. Bui, Richard E. Turner

TL;DR

VCL offers a principled, Bayesian framework for continual learning by performing online variational updates and optionally leveraging a small episodic memory (coresets) to guard against forgetting. It generalizes to both discriminative and generative deep models, yielding state-of-the-art results against standard baselines without hyperparameter tuning. The approach preserves uncertainty information and demonstrates strong long-term retention across sequences of tasks, with empirical gains shown on permuted/split MNIST and deep VAEs. The work also clarifies the relationship between online VI, Laplace propagation, and regularization methods, and suggests directions for richer memories and alternative approximate-inference schemes.

Abstract

This paper develops variational continual learning (VCL), a simple but general framework for continual learning that fuses online variational inference (VI) and recent advances in Monte Carlo VI for neural networks. The framework can successfully train both deep discriminative models and deep generative models in complex continual learning settings where existing tasks evolve over time and entirely new tasks emerge. Experimental results show that VCL outperforms state-of-the-art continual learning methods on a variety of tasks, avoiding catastrophic forgetting in a fully automatic way.

Variational Continual Learning

TL;DR

VCL offers a principled, Bayesian framework for continual learning by performing online variational updates and optionally leveraging a small episodic memory (coresets) to guard against forgetting. It generalizes to both discriminative and generative deep models, yielding state-of-the-art results against standard baselines without hyperparameter tuning. The approach preserves uncertainty information and demonstrates strong long-term retention across sequences of tasks, with empirical gains shown on permuted/split MNIST and deep VAEs. The work also clarifies the relationship between online VI, Laplace propagation, and regularization methods, and suggests directions for richer memories and alternative approximate-inference schemes.

Abstract

This paper develops variational continual learning (VCL), a simple but general framework for continual learning that fuses online variational inference (VI) and recent advances in Monte Carlo VI for neural networks. The framework can successfully train both deep discriminative models and deep generative models in complex continual learning settings where existing tasks evolve over time and entirely new tasks emerge. Experimental results show that VCL outperforms state-of-the-art continual learning methods on a variety of tasks, avoiding catastrophic forgetting in a fully automatic way.

Paper Structure

This paper contains 16 sections, 11 equations, 12 figures, 1 algorithm.

Figures (12)

  • Figure 1: Schematics of the multi-head networks tested in the paper, including both the graphical model (left) and network architecture (right). (a) A multi-head discriminative model showing how network parameters might be shared during training. The lower-level network is parameterized by the variables $\bm{\theta}^S$ and is shared across multiple tasks. Each task $t$ has its own "head network" $\bm{\theta}^H_t$ mapping to the outputs from a common hidden layer. The full set of parameters is therefore $\bm{\theta} = \{\bm{\theta}^H_{1:T}, \bm{\theta}^S\}$. (b) A multi-head generative model with shared network parameters. The head networks generate the intermediate level representations from the latent variables $\mathbf{z}$.
  • Figure 2: Average test set accuracy on all observed tasks in the Permuted MNIST experiment.
  • Figure 3: Comparison of the effect of coreset sizes in the Permuted MNIST experiment.
  • Figure 4: Test set accuracy on all tasks for the Split MNIST experiment. The last column shows the average accuracy over all tasks. The bottom row is a zoomed version of the top row.
  • Figure 5: Test set accuracy on all tasks for the Split notMNIST experiment. The last column shows the average accuracy over all tasks. The bottom row is a zoomed version of the top row.
  • ...and 7 more figures