GRID: Scalable Task-Agnostic Prompt-Based Continual Learning for Language Models

Anushka Tiwari; Sayantan Pal; Rohini K. Srihari; Kaiyi Ji

GRID: Scalable Task-Agnostic Prompt-Based Continual Learning for Language Models

Anushka Tiwari, Sayantan Pal, Rohini K. Srihari, Kaiyi Ji

TL;DR

GRID tackles latent forgetting and unbounded prompt growth in task-agnostic prompt-based continual learning for LLMs by combining constrained decoding with gradient-guided prompt compression. It introduces representative input sampling and task identification to stabilize decoding, and a gradient-based mechanism to prune and aggregate prompts, maintaining a compact yet informative prompt pool. Empirical results across long sequences and negative-transfer benchmarks show substantial improvements in backward transfer and memory efficiency, with competitive forward transfer and scalability to large models. The approach enables robust, privacy-conscious continual learning without relying on explicit task IDs, making it practically impactful for real-world deployment of language models.

Abstract

Prompt-based continual learning (CL) provides a parameter-efficient approach for adapting large language models (LLMs) across task sequences. However, most existing methods rely on task-aware inference and maintain a growing set of task-specific prompts, which introduces two major challenges: (1) severe performance degradation on earlier tasks under task-agnostic inference, and (2) limited scalability due to prompt memory accumulation as task sequences grow. In this paper, we present GRID, a unified framework designed to address these challenges. GRID incorporates a decoding mechanism that enhances backward transfer by leveraging representative inputs, automatic task identification, and constrained decoding. Furthermore, it employs a gradient-guided prompt selection strategy to compress less informative prompts into a single aggregated representation, ensuring scalable and memory-efficient continual learning. Extensive experiments on long-sequence and negative transfer benchmarks show that GRID improves average accuracy and backward transfer, achieves competitive forward transfer, and substantially reduces prompt memory usage.

GRID: Scalable Task-Agnostic Prompt-Based Continual Learning for Language Models

TL;DR

Abstract

GRID: Scalable Task-Agnostic Prompt-Based Continual Learning for Language Models

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (28)

Theorems & Definitions (1)