Table of Contents
Fetching ...

Progressive Prompts: Continual Learning for Language Models

Anastasia Razdaibiedina, Yuning Mao, Rui Hou, Madian Khabsa, Mike Lewis, Amjad Almahairi

TL;DR

Progressive Prompts addresses catastrophic forgetting in continual learning for large language models by learning a small per-task soft prompt for each task and progressively concatenating them while keeping the base model frozen. It introduces a residual MLP-based reparameterization to stabilize prompt optimization. Empirically, it outperforms state-of-the-art CL methods on BERT and T5 across standard benchmarks and long-sequence task settings, with over 20% gains on T5 in few-shot regimes and clear forward-transfer benefits. The approach is memory-efficient and model-agnostic, requiring far fewer task-specific parameters than full fine-tuning or architectural alternatives.

Abstract

We introduce Progressive Prompts - a simple and efficient approach for continual learning in language models. Our method allows forward transfer and resists catastrophic forgetting, without relying on data replay or a large number of task-specific parameters. Progressive Prompts learns a new soft prompt for each task and sequentially concatenates it with the previously learned prompts, while keeping the base model frozen. Experiments on standard continual learning benchmarks show that our approach outperforms state-of-the-art methods, with an improvement >20% in average test accuracy over the previous best-preforming method on T5 model. We also explore a more challenging continual learning setup with longer sequences of tasks and show that Progressive Prompts significantly outperforms prior methods.

Progressive Prompts: Continual Learning for Language Models

TL;DR

Progressive Prompts addresses catastrophic forgetting in continual learning for large language models by learning a small per-task soft prompt for each task and progressively concatenating them while keeping the base model frozen. It introduces a residual MLP-based reparameterization to stabilize prompt optimization. Empirically, it outperforms state-of-the-art CL methods on BERT and T5 across standard benchmarks and long-sequence task settings, with over 20% gains on T5 in few-shot regimes and clear forward-transfer benefits. The approach is memory-efficient and model-agnostic, requiring far fewer task-specific parameters than full fine-tuning or architectural alternatives.

Abstract

We introduce Progressive Prompts - a simple and efficient approach for continual learning in language models. Our method allows forward transfer and resists catastrophic forgetting, without relying on data replay or a large number of task-specific parameters. Progressive Prompts learns a new soft prompt for each task and sequentially concatenates it with the previously learned prompts, while keeping the base model frozen. Experiments on standard continual learning benchmarks show that our approach outperforms state-of-the-art methods, with an improvement >20% in average test accuracy over the previous best-preforming method on T5 model. We also explore a more challenging continual learning setup with longer sequences of tasks and show that Progressive Prompts significantly outperforms prior methods.
Paper Structure (25 sections, 6 equations, 13 figures, 10 tables)

This paper contains 25 sections, 6 equations, 13 figures, 10 tables.

Figures (13)

  • Figure 1: Illustrating our proposed method Progressive Prompts and contrasting it with a simple adaptation of progressive networks using prompt tuning. In the simple adaptation of progressive networks we learn a separate prompt and repeat the frozen input embeddings for each new task. This setup requires repeating input tokens for each task. In Progressive Prompts we use the same input and progressively append new prompt for each new task. Prior task prompts are not modified by the addition of new prompts.
  • Figure 2: Average attention scores between prompts in Progressive Prompts.
  • Figure 3: Transfer learning experimental setup. Prompt Tuning: a single prompt of 100 tokens is trained on target task. Progressive Prompt: two prompts of 50 tokens are trained sequentially on source and target tasks.
  • Figure 4: Progressive Prompts achieve forward transfer and outperform Prompt Tuning under different dataset sizes. Average scores across six target tasks are reported.
  • Figure 5: Forward transfer score of different approaches on order 8. Different data limits are shown (20, 200 and 1000 samples per class).
  • ...and 8 more figures