Table of Contents
Fetching ...

Energy-Based Models for Continual Learning

Shuang Li, Yilun Du, Gido M. van de Ven, Igor Mordatch

TL;DR

This work reframes continual learning as energy-based modeling for classification, introducing an unnormalized distribution p(y|x) defined by an energy E(x,y) and employing a simple contrastive divergence objective to update ground-truth versus a negative class. The approach enables selective updates that mitigate forgetting without relying on replay or explicit task boundaries, and it naturally extends to boundary-free data streams. Empirical results on boundary-aware and boundary-free benchmarks (split MNIST, permuted MNIST, CIFAR-10, CIFAR-100) show EBMs outperform softmax baselines and many CL methods, with competitive gains when combined with replay. Overall, EBMs provide a flexible, scalable building block for continual learning that can adapt to diverse task structures and data distributions, with potential for further integration and architectural enhancements.

Abstract

We motivate Energy-Based Models (EBMs) as a promising model class for continual learning problems. Instead of tackling continual learning via the use of external memory, growing models, or regularization, EBMs change the underlying training objective to cause less interference with previously learned information. Our proposed version of EBMs for continual learning is simple, efficient, and outperforms baseline methods by a large margin on several benchmarks. Moreover, our proposed contrastive divergence-based training objective can be combined with other continual learning methods, resulting in substantial boosts in their performance. We further show that EBMs are adaptable to a more general continual learning setting where the data distribution changes without the notion of explicitly delineated tasks. These observations point towards EBMs as a useful building block for future continual learning methods.

Energy-Based Models for Continual Learning

TL;DR

This work reframes continual learning as energy-based modeling for classification, introducing an unnormalized distribution p(y|x) defined by an energy E(x,y) and employing a simple contrastive divergence objective to update ground-truth versus a negative class. The approach enables selective updates that mitigate forgetting without relying on replay or explicit task boundaries, and it naturally extends to boundary-free data streams. Empirical results on boundary-aware and boundary-free benchmarks (split MNIST, permuted MNIST, CIFAR-10, CIFAR-100) show EBMs outperform softmax baselines and many CL methods, with competitive gains when combined with replay. Overall, EBMs provide a flexible, scalable building block for continual learning that can adapt to diverse task structures and data distributions, with potential for further integration and architectural enhancements.

Abstract

We motivate Energy-Based Models (EBMs) as a promising model class for continual learning problems. Instead of tackling continual learning via the use of external memory, growing models, or regularization, EBMs change the underlying training objective to cause less interference with previously learned information. Our proposed version of EBMs for continual learning is simple, efficient, and outperforms baseline methods by a large margin on several benchmarks. Moreover, our proposed contrastive divergence-based training objective can be combined with other continual learning methods, resulting in substantial boosts in their performance. We further show that EBMs are adaptable to a more general continual learning setting where the data distribution changes without the notion of explicitly delineated tasks. These observations point towards EBMs as a useful building block for future continual learning methods.

Paper Structure

This paper contains 37 sections, 13 equations, 7 figures, 26 tables.

Figures (7)

  • Figure 1: Schematic of the model architectures of the softmax-based classifier (SBC) and energy-based models (EBM). SBC takes an image ${\mathbf{x}}$ as input and outputs a fixed pre-defined $N$-dimensional vector. EBM takes a data ${\mathbf{x}}$ and a class $y$ as input and outputs their energy value. The dash lines are optional skip connections.
  • Figure 2: Class-IL testing accuracy of SBC, SBC using our training objective (SBC*), and EBMs on each task on the permuted MNIST dataset.
  • Figure 3: Performance of EBM on CIFAR-100 with different strategies for selecting the negative samples.
  • Figure 3: Energy landmaps of SBC and EBMs after training on task $T_9$ and $T_{10}$ on permuted MNIST. The darker the diagonal is, the better the model is in preventing forgetting previous tasks.
  • Figure 4: Predicted label distribution after learning each task on split MNIST. SBC only predicts classes from the current task, while EBM predicts classes from all seen classes.
  • ...and 2 more figures