Table of Contents
Fetching ...

Teaching Prompts to Coordinate: Hierarchical Layer-Grouped Prompt Tuning for Continual Learning

Shengqin Jiang, Tianqi Kong, Yuankai Qi, Haokui Zhang, Lina Yao, Quan Z. Sheng, Qingshan Liu, Ming-Hsuan Yang

TL;DR

This work tackles catastrophic forgetting in continual learning by introducing hierarchical layer-grouped prompt tuning (HLGP). HLGP replaces independent per-layer prompts with shared implicit prompts generated for layer groups from a single task-specific root prompt $p_t$, using intermediate adapters to map to group prompts and adding a position incentive embedding (PIE) to preserve layer order. A soft task matching (SoTM) mechanism weights sub-prompts based on inference dynamics, enabling effective fusion without explicit task IDs. Across CIFAR-100, IN-R, IN-A, and VTAB, HLGP with PIE and SoTM delivers state-of-the-art final and continual accuracy while reducing the number of trainable parameters, demonstrating robustness and practical utility for prompt-based continual learning.

Abstract

Prompt-based continual learning methods fine-tune only a small set of additional learnable parameters while keeping the pre-trained model's parameters frozen. It enables efficient adaptation to new tasks while mitigating the risk of catastrophic forgetting. These methods typically attach one independent task-specific prompt to each layer of pre-trained models to locally modulate its features, ensuring that the layer's representation aligns with the requirements of the new task. However, although introducing learnable prompts independently at each layer provides high flexibility for adapting to new tasks, this overly flexible tuning could make certain layers susceptible to unnecessary updates. As all prompts till the current task are added together as a final prompt for all seen tasks, the model may easily overwrite feature representations essential to previous tasks, which increases the risk of catastrophic forgetting. To address this issue, we propose a novel hierarchical layer-grouped prompt tuning method for continual learning. It improves model stability in two ways: (i) Layers in the same group share roughly the same prompts, which are adjusted by position encoding. This helps preserve the intrinsic feature relationships and propagation pathways of the pre-trained model within each group. (ii) It utilizes a single task-specific root prompt to learn to generate sub-prompts for each layer group. In this way, all sub-prompts are conditioned on the same root prompt, enhancing their synergy and reducing independence. Extensive experiments across four benchmarks demonstrate that our method achieves favorable performance compared with several state-of-the-art methods.

Teaching Prompts to Coordinate: Hierarchical Layer-Grouped Prompt Tuning for Continual Learning

TL;DR

This work tackles catastrophic forgetting in continual learning by introducing hierarchical layer-grouped prompt tuning (HLGP). HLGP replaces independent per-layer prompts with shared implicit prompts generated for layer groups from a single task-specific root prompt , using intermediate adapters to map to group prompts and adding a position incentive embedding (PIE) to preserve layer order. A soft task matching (SoTM) mechanism weights sub-prompts based on inference dynamics, enabling effective fusion without explicit task IDs. Across CIFAR-100, IN-R, IN-A, and VTAB, HLGP with PIE and SoTM delivers state-of-the-art final and continual accuracy while reducing the number of trainable parameters, demonstrating robustness and practical utility for prompt-based continual learning.

Abstract

Prompt-based continual learning methods fine-tune only a small set of additional learnable parameters while keeping the pre-trained model's parameters frozen. It enables efficient adaptation to new tasks while mitigating the risk of catastrophic forgetting. These methods typically attach one independent task-specific prompt to each layer of pre-trained models to locally modulate its features, ensuring that the layer's representation aligns with the requirements of the new task. However, although introducing learnable prompts independently at each layer provides high flexibility for adapting to new tasks, this overly flexible tuning could make certain layers susceptible to unnecessary updates. As all prompts till the current task are added together as a final prompt for all seen tasks, the model may easily overwrite feature representations essential to previous tasks, which increases the risk of catastrophic forgetting. To address this issue, we propose a novel hierarchical layer-grouped prompt tuning method for continual learning. It improves model stability in two ways: (i) Layers in the same group share roughly the same prompts, which are adjusted by position encoding. This helps preserve the intrinsic feature relationships and propagation pathways of the pre-trained model within each group. (ii) It utilizes a single task-specific root prompt to learn to generate sub-prompts for each layer group. In this way, all sub-prompts are conditioned on the same root prompt, enhancing their synergy and reducing independence. Extensive experiments across four benchmarks demonstrate that our method achieves favorable performance compared with several state-of-the-art methods.

Paper Structure

This paper contains 13 sections, 2 equations, 5 figures, 4 tables.

Figures (5)

  • Figure 1: Comparison between conventional independent prompt tuning (e.g., li2025caprompt) and our grouped prompting. Existing methods typically fine-tune pre-trained Transformer layers with independent prompts. Although this design offers high flexibility for learning new tasks, it can make critical representations in certain layers of old tasks susceptible to unnecessary updates, thereby exacerbating catastrophic forgetting. In contrast, our method addresses this issue by generating shared implicit prompts.
  • Figure 2: Overview of our method for CL. The continual learning framework adapts a pre-trained model through prompt tuning, allowing it to acquire new knowledge while preserving previously learned information. Instead of the conventional approach of constructing prompts independently, we propose a hierarchical layer-grouped prompt tuning method (Sec. \ref{['sec:lgp']}) that dynamically produces a set of shared implicit prompts for each group of layers from a task-specific prompt using intermediate adapters, enhancing the synergy of prompts across layer groups and within each group. Building on this, we introduce positional incentive embeddings (PIE) (Sec. \ref{['sec:pie']}), which enable prompts to recognize their sequential order within a group. Finally, a soft task matching (SoTM) method (Sec. \ref{['sec:stm']}) is employed to weight the sub-prompts, improving overall network performance.
  • Figure 3: Final average accuracy of different methods after each incremental task.
  • Figure 4: Ablation study on the intermediate feature dimension of adapters.
  • Figure 5: Comparison of feature visualization of our method with and without PIE. We visualize the high-level semantic explanation for our method without/with PIE by attention guided CAM LeemS24. (a) denotes the input images, whereas (b) and (c) present the corresponding cases without/with PIE.