Table of Contents
Fetching ...

LSEBMCL: A Latent Space Energy-Based Model for Continual Learning

Xiaodi Li, Dingcheng Li, Rujun Gao, Mahmoud Zamani, Latifur Khan

TL;DR

This paper tackles catastrophic forgetting in continual learning for NLP by introducing LSEBMCL, which embeds a latent-space energy-based model as an outer-generator that replays samples from previous tasks during training. The framework uses an inference network, two operators, and an energy function on top of a pre-trained base model (Mistral 7B) to guide learning, with Langevin dynamics enabling sampling from the latent space. Empirical results across SQuAD 2.0, WikiSQL, SST, QA-SRL, WOZ, and five DecaNLP tasks show state-of-the-art performance and robustness to task order, with performance improving as the replay sampling ratio gamma increases and approaching the multitask upper bound. The approach offers a scalable, data-efficient replay mechanism that can extend to other domains beyond NLP, such as computer vision, by leveraging latent-space EBMs for interpretable generation and classification.

Abstract

Continual learning has become essential in many practical applications such as online news summaries and product classification. The primary challenge is known as catastrophic forgetting, a phenomenon where a model inadvertently discards previously learned knowledge when it is trained on new tasks. Existing solutions involve storing exemplars from previous classes, regularizing parameters during the fine-tuning process, or assigning different model parameters to each task. The proposed solution LSEBMCL (Latent Space Energy-Based Model for Continual Learning) in this work is to use energy-based models (EBMs) to prevent catastrophic forgetting by sampling data points from previous tasks when training on new ones. The EBM is a machine learning model that associates an energy value with each input data point. The proposed method uses an EBM layer as an outer-generator in the continual learning framework for NLP tasks. The study demonstrates the efficacy of EBM in NLP tasks, achieving state-of-the-art results in all experiments.

LSEBMCL: A Latent Space Energy-Based Model for Continual Learning

TL;DR

This paper tackles catastrophic forgetting in continual learning for NLP by introducing LSEBMCL, which embeds a latent-space energy-based model as an outer-generator that replays samples from previous tasks during training. The framework uses an inference network, two operators, and an energy function on top of a pre-trained base model (Mistral 7B) to guide learning, with Langevin dynamics enabling sampling from the latent space. Empirical results across SQuAD 2.0, WikiSQL, SST, QA-SRL, WOZ, and five DecaNLP tasks show state-of-the-art performance and robustness to task order, with performance improving as the replay sampling ratio gamma increases and approaching the multitask upper bound. The approach offers a scalable, data-efficient replay mechanism that can extend to other domains beyond NLP, such as computer vision, by leveraging latent-space EBMs for interpretable generation and classification.

Abstract

Continual learning has become essential in many practical applications such as online news summaries and product classification. The primary challenge is known as catastrophic forgetting, a phenomenon where a model inadvertently discards previously learned knowledge when it is trained on new tasks. Existing solutions involve storing exemplars from previous classes, regularizing parameters during the fine-tuning process, or assigning different model parameters to each task. The proposed solution LSEBMCL (Latent Space Energy-Based Model for Continual Learning) in this work is to use energy-based models (EBMs) to prevent catastrophic forgetting by sampling data points from previous tasks when training on new ones. The EBM is a machine learning model that associates an energy value with each input data point. The proposed method uses an EBM layer as an outer-generator in the continual learning framework for NLP tasks. The study demonstrates the efficacy of EBM in NLP tasks, achieving state-of-the-art results in all experiments.
Paper Structure (18 sections, 15 equations, 1 figure, 4 tables)

This paper contains 18 sections, 15 equations, 1 figure, 4 tables.

Figures (1)

  • Figure 1: The Overview of LSEBMCL Framework. (1) Inference Network: The process begins with the inference network at the bottom, where inputs (x) are processed to generate encoded representations (z). (2) Operator 1 and (3) Operator 2: These operators facilitate the transition of logits from the inference network to the decoder inputs and compute the energy on the outputs, respectively. (4) Energy Function: At the culmination of the process, the energy function evaluates the outputs, contributing to the model's generation.