Hierarchical Prompts for Rehearsal-free Continual Learning

Yukun Zuo; Hantao Yao; Lu Yu; Liansheng Zhuang; Changsheng Xu

Hierarchical Prompts for Rehearsal-free Continual Learning

Yukun Zuo, Hantao Yao, Lu Yu, Liansheng Zhuang, Changsheng Xu

TL;DR

This paper tackles catastrophic forgetting in rehearsal-free continual learning by introducing Hierarchical Prompts (H-Prompts), a three-tier prompt design consisting of class prompts to preserve past class distributions, task prompts to fuse past and current task knowledge, and general prompts to learn generalized representations. The method leverages Bayesian Distribution Alignment to model class distributions, Cross-task Knowledge Excavation to transfer past knowledge into the current task prompt, and Generalized Knowledge Exploration to obtain robust self-supervised features, all while keeping a frozen backbone. The total objective combines these components, and an innovative inference strategy uses a task-aware query-key mechanism to identify the appropriate task prompts during testing. Empirical results on Split CIFAR-100 and Split ImageNet-R show state-of-the-art performance with high average accuracy and low forgetting, and extensive ablations validate the contributions of each prompt type and the inference strategy. The work indicates that structured hierarchical prompts can substantially improve rehearsal-free continual learning and generalization, with potential extensions to broader vision tasks.

Abstract

Continual learning endeavors to equip the model with the capability to integrate current task knowledge while mitigating the forgetting of past task knowledge. Inspired by prompt tuning, prompt-based methods maintain a frozen backbone and train with slight learnable prompts to minimize the catastrophic forgetting that arises due to updating a large number of backbone parameters. Nonetheless, these learnable prompts tend to concentrate on the discriminatory knowledge of the current task while ignoring past task knowledge, leading to that learnable prompts still suffering from catastrophic forgetting. This paper introduces a novel rehearsal-free paradigm for continual learning termed Hierarchical Prompts (H-Prompts), comprising three categories of prompts -- class prompt, task prompt, and general prompt. To effectively depict the knowledge of past classes, class prompt leverages Bayesian Distribution Alignment to model the distribution of classes in each task. To reduce the forgetting of past task knowledge, task prompt employs Cross-task Knowledge Excavation to amalgamate the knowledge encapsulated in the learned class prompts of past tasks and current task knowledge. Furthermore, general prompt utilizes Generalized Knowledge Exploration to deduce highly generalized knowledge in a self-supervised manner. Evaluations on two benchmarks substantiate the efficacy of the proposed H-Prompts, exemplified by an average accuracy of 87.8% in Split CIFAR-100 and 70.6% in Split ImageNet-R.

Hierarchical Prompts for Rehearsal-free Continual Learning

TL;DR

Abstract

Paper Structure (22 sections, 22 equations, 11 figures, 6 tables)

This paper contains 22 sections, 22 equations, 11 figures, 6 tables.

Introduction
Introduction
Related Work
Prompt Learning
Continual Learning
Bayesian Neural Networks
Hierarchical Prompts
Overview
Bayesian Distribution Alignment
Cross-task Knowledge Excavation
Generalized Knowledge Exploration
Total Objective
Inference Strategy
Experiments
Settings
...and 7 more sections

Figures (11)

Figure 1: (a) Previous prompt-based methods train prompt to focus on the knowledge of current task, while ignoring the knowledge of past tasks. (b) H-Prompts stores and integrates past task knowledge with current task knowledge during prompt tuning.
Figure 2: An overview of class prompt $\textcolor{blue}{\mathbf{c}_{i,m}}$, task prompt $\textcolor{orange}{\mathbf{t}_i = \{\mathbf{t}_{i,l}\}_{l=1}^{\varGamma_t}}$, and general prompt $\textcolor{green}{\mathbf{g}_i = \{\mathbf{g}_{i,l}\}_{l=1}^{\varGamma_g}}$. (a) depicts trainable task prompt $\mathbf{t}_i = \{\mathbf{t}_{i,l}\}_{l=1}^{\varGamma_t}$ and general prompt $\mathbf{g}_i = \{\mathbf{g}_{i,l}\}_{l=1}^{\varGamma_g}$ are extended to the inputs of frozed multiple Transformer for capturing task knowledge and learning generalized knowledge, respectively. (b) presents trainable class prompt $\mathbf{c}_{i,m}$ replaces the position of input with fixed task prompt $\mathbf{t}_i = \{\mathbf{t}_{i,l}\}_{l=1}^{\varGamma_t}$ and general prompt $\mathbf{g}_i = \{\mathbf{g}_{i,l}\}_{l=1}^{\varGamma_g}$ to preserve the knowledge in each class.
Figure 3: The framework of the proposed H-Prompts. In the current task, input $x_{i,m}$ (class prompt $\mathbf{c}_{i,m}$) is extended with task prompt $\mathbf{t}_{i}$ and general prompt $\mathbf{g}_{i}$ to obtain adapted representation $\bar{\mathbf{q}}_{i,m}$ (virtual representation $\bar{\mathbf{q}}'_{i,m}$ ). Similarly, we gain past virtual representation $\bar{\mathbf{q}}'_{v,u}$ for past class prompt $\mathbf{c}_{v,u}$. Bayesian Distribution Alignment adjusts $\mathbf{c}_{i,m}$ to align the distributions between $\bar{\mathbf{q}}_{i,m}$ and $\bar{\mathbf{q}}'_{i,m}$. Cross-task Knowledge Excavation jointly trains $\bar{\mathbf{q}}'_{v,u}$ and $\bar{\mathbf{q}}_{i,m}$ to optimize $\mathbf{t}_{i}$. Moreover, Generalized Knowledge Exploration utilizes $\bar{\mathbf{q}}_{i,m}$ to conduct self-supervised learning for updating $\mathbf{g_i}$.
Figure 4: In Bayesian Distribution Alignment, we first fix discriminative classifier $\mathcal{C}_d$ and update class prompt $\mathbf{c}_{i,m}$ to deceive $\mathcal{C}_d$ by classifying $\mathbf{c}_{i,m}$ correctly with true label $m$. Then, we fix the $\mathbf{c}_{i,m}$ and update the $\mathcal{C}_d$ to misclassify $\mathbf{c}_{i,m}$ with fake label $m+|\mathcal{Y}_i|$ and classify input $x_{i,m}$ with true label $m$.
Figure 5: In Cross-task Knowledge Excavation, we optimize task prompt $\mathbf{t}_{i}$ to classify the input sampled from past class prompt $\mathbf{c}_{v,u}$ for encoding past task knowledge. Moreover, we update $\mathbf{t}_{i}$ and classification classifier $\mathcal{C}_{c}$ to classify input $x_{i,m}$ for learning current task knowledge.
...and 6 more figures

Hierarchical Prompts for Rehearsal-free Continual Learning

TL;DR

Abstract

Hierarchical Prompts for Rehearsal-free Continual Learning

Authors

TL;DR

Abstract

Table of Contents

Figures (11)