Table of Contents
Fetching ...

Efficient Continual Learning for Small Language Models with a Discrete Key-Value Bottleneck

Andor Diera, Lukas Galke, Fabian Karl, Ansgar Scherp

TL;DR

A discrete key-value bottleneck (DKVB) is introduced for encoder-only language models, enabling efficient continual learning through localized updates through localized updates and remains effective even in challenging single-head continual learning scenarios where no task ID is provided.

Abstract

Continual learning remains a challenge across various natural language processing (NLP) tasks, as models updated with new training data often risk catastrophic forgetting of previously acquired knowledge. We introduce a discrete key-value bottleneck (DKVB) for encoder-only language models, enabling efficient continual learning through localized updates. Inspired by a discrete key-value bottleneck in vision, we consider new and NLP-specific challenges. We compare different bottleneck architectures for NLP and introduce a new, task-independent initialization technique for the discrete keys. We evaluate our DKVB for NLP in four continual learning scenarios and show that it alleviates catastrophic forgetting. Our experiments demonstrate that the proposed approach achieves competitive performance compared to popular continual learning methods while incurring lower computational costs. Furthermore, we show that DKVB remains effective even in challenging single-head continual learning scenarios where no task ID is provided.

Efficient Continual Learning for Small Language Models with a Discrete Key-Value Bottleneck

TL;DR

A discrete key-value bottleneck (DKVB) is introduced for encoder-only language models, enabling efficient continual learning through localized updates through localized updates and remains effective even in challenging single-head continual learning scenarios where no task ID is provided.

Abstract

Continual learning remains a challenge across various natural language processing (NLP) tasks, as models updated with new training data often risk catastrophic forgetting of previously acquired knowledge. We introduce a discrete key-value bottleneck (DKVB) for encoder-only language models, enabling efficient continual learning through localized updates. Inspired by a discrete key-value bottleneck in vision, we consider new and NLP-specific challenges. We compare different bottleneck architectures for NLP and introduce a new, task-independent initialization technique for the discrete keys. We evaluate our DKVB for NLP in four continual learning scenarios and show that it alleviates catastrophic forgetting. Our experiments demonstrate that the proposed approach achieves competitive performance compared to popular continual learning methods while incurring lower computational costs. Furthermore, we show that DKVB remains effective even in challenging single-head continual learning scenarios where no task ID is provided.

Paper Structure

This paper contains 42 sections, 1 equation, 3 figures, 10 tables.

Figures (3)

  • Figure 1: The base Discrete Key-Value Bottleneck.
  • Figure 2: Progressive test accuracy scores in the single-head class increment learning setup, averaged over 5 runs with fixed sequence order
  • Figure 3: Assessing the sensitivity of bottleneck parameters in regards of test accuracy: (a) Dimensionality of discrete key (b) Number of key-value pairs