Table of Contents
Fetching ...

Prompt Customization for Continual Learning

Yong Dai, Xiaopeng Hong, Yabin Wang, Zhiheng Ma, Dongmei Jiang, Yaowei Wang

TL;DR

This work tackles prompt-based continual learning by addressing the instability of hard prompt selection as tasks accumulate. It introduces Prompt Customization (PC), which combines a Prompt Generation Module (PGM) and a Prompt Modulation Module (PMM) to generate and adapt instance-specific prompts from a fixed codebook, removing the need for task-wise prompt selection. The generated prompts are integrated into a frozen Vision Transformer backbone, with a momentum-updated codebook and regularization to preserve old knowledge, achieving up to 16.2% gains in average accuracy across class, domain, and task-agnostic settings. The approach demonstrates strong scalability and robustness across four datasets, suggesting practical impact for continual learning in dynamic environments and motivating future exploration of multimodal and codebook-free prompt strategies.

Abstract

Contemporary continual learning approaches typically select prompts from a pool, which function as supplementary inputs to a pre-trained model. However, this strategy is hindered by the inherent noise of its selection approach when handling increasing tasks. In response to these challenges, we reformulate the prompting approach for continual learning and propose the prompt customization (PC) method. PC mainly comprises a prompt generation module (PGM) and a prompt modulation module (PMM). In contrast to conventional methods that employ hard prompt selection, PGM assigns different coefficients to prompts from a fixed-sized pool of prompts and generates tailored prompts. Moreover, PMM further modulates the prompts by adaptively assigning weights according to the correlations between input data and corresponding prompts. We evaluate our method on four benchmark datasets for three diverse settings, including the class, domain, and task-agnostic incremental learning tasks. Experimental results demonstrate consistent improvement (by up to 16.2\%), yielded by the proposed method, over the state-of-the-art (SOTA) techniques.

Prompt Customization for Continual Learning

TL;DR

This work tackles prompt-based continual learning by addressing the instability of hard prompt selection as tasks accumulate. It introduces Prompt Customization (PC), which combines a Prompt Generation Module (PGM) and a Prompt Modulation Module (PMM) to generate and adapt instance-specific prompts from a fixed codebook, removing the need for task-wise prompt selection. The generated prompts are integrated into a frozen Vision Transformer backbone, with a momentum-updated codebook and regularization to preserve old knowledge, achieving up to 16.2% gains in average accuracy across class, domain, and task-agnostic settings. The approach demonstrates strong scalability and robustness across four datasets, suggesting practical impact for continual learning in dynamic environments and motivating future exploration of multimodal and codebook-free prompt strategies.

Abstract

Contemporary continual learning approaches typically select prompts from a pool, which function as supplementary inputs to a pre-trained model. However, this strategy is hindered by the inherent noise of its selection approach when handling increasing tasks. In response to these challenges, we reformulate the prompting approach for continual learning and propose the prompt customization (PC) method. PC mainly comprises a prompt generation module (PGM) and a prompt modulation module (PMM). In contrast to conventional methods that employ hard prompt selection, PGM assigns different coefficients to prompts from a fixed-sized pool of prompts and generates tailored prompts. Moreover, PMM further modulates the prompts by adaptively assigning weights according to the correlations between input data and corresponding prompts. We evaluate our method on four benchmark datasets for three diverse settings, including the class, domain, and task-agnostic incremental learning tasks. Experimental results demonstrate consistent improvement (by up to 16.2\%), yielded by the proposed method, over the state-of-the-art (SOTA) techniques.
Paper Structure (34 sections, 13 equations, 8 figures, 9 tables)

This paper contains 34 sections, 13 equations, 8 figures, 9 tables.

Figures (8)

  • Figure 1: Prior worksL2PdualpromptSprompts employ a deterministic selection of prompts. Our work tailors specific prompts for each instance through a process of prompt generation from a fixed-size codebook and soft prompt modulation. The key differences lie in the generated prompts by a linear combination of prompts based on instance-specific coefficients and modulated prompts by further assigning adaptive weights.
  • Figure 2: The framework of the proposed PC. The PC comprises two integral modules: the PGM and PMM. PGM takes input encoding and codebook as input and generates input-specific prompts through a linear combination of prompts from a designated codebook. PMM takes input encoding and the above generated prompts as input to quantize their correlations which are further utilized to modulate the corresponding prompts. The final input-specific modulated prompts will be inserted into the corresponding MHSA layers to assist the frozen backbone in performing classification tasks. Unlike the previous works, our PC circumvents the need for rigorous prompt selection during inference and generates finer instance-specific prompts which are more distinguishable and expressive. 'Atten.' and 'MHSA' mean attention and multi-head self-attention layer, respectively. For simplicity, all symbols are present without subscripts.
  • Figure 3: The visualization of codebook and prompts for the first, fifth, and tenth tasks. (a) t-SNE visualization of the updated codebook and prompts; (b) Gaussian distribution of prompts. The figure is best viewed in color.
  • Figure 4: The performance with regard to different codebook sizes (Cs) and prompt sizes (Ps). The results demonstrate the stability of the proposed PC with small fluctuations concerning different parameters.
  • Figure 5: The performance respects to the weight $\lambda$.
  • ...and 3 more figures