Table of Contents
Fetching ...

ProTPS: Prototype-Guided Text Prompt Selection for Continual Learning

Jie Mei, Li-Leng Peng, Keith Fuller, Jenq-Neng Hwang

Abstract

For continual learning, text-prompt-based methods leverage text encoders and learnable prompts to encode semantic features for sequentially arrived classes over time. A common challenge encountered by existing works is how to learn unique text prompts, which implicitly carry semantic information of new classes, so that the semantic features of newly arrived classes do not overlap with those of trained classes, thereby mitigating the catastrophic forgetting problem. To address this challenge, we propose a novel approach Prototype-guided Text Prompt Selection (ProTPS)'' to intentionally increase the training flexibility thus encouraging the learning of unique text prompts. Specifically, our ProTPS learns class-specific vision prototypes and text prompts. Vision prototypes guide the selection and learning of text prompts for each class. We first evaluate our ProTPS in both class incremental (CI) setting and cross-datasets continual (CDC) learning setting. Because our ProTPS achieves performance close to the upper bounds, we further collect a real-world dataset with 112 marine species collected over a span of six years, named Marine112, to bring new challenges to the community. Marine112 is authentically suited for the class and domain incremental (CDI) learning setting and is under natural long-tail distribution. The results under three settings show that our ProTPS performs favorably against the recent state-of-the-art methods. The implementation code and Marine112 dataset will be released upon the acceptance of our paper.

ProTPS: Prototype-Guided Text Prompt Selection for Continual Learning

Abstract

For continual learning, text-prompt-based methods leverage text encoders and learnable prompts to encode semantic features for sequentially arrived classes over time. A common challenge encountered by existing works is how to learn unique text prompts, which implicitly carry semantic information of new classes, so that the semantic features of newly arrived classes do not overlap with those of trained classes, thereby mitigating the catastrophic forgetting problem. To address this challenge, we propose a novel approach Prototype-guided Text Prompt Selection (ProTPS)'' to intentionally increase the training flexibility thus encouraging the learning of unique text prompts. Specifically, our ProTPS learns class-specific vision prototypes and text prompts. Vision prototypes guide the selection and learning of text prompts for each class. We first evaluate our ProTPS in both class incremental (CI) setting and cross-datasets continual (CDC) learning setting. Because our ProTPS achieves performance close to the upper bounds, we further collect a real-world dataset with 112 marine species collected over a span of six years, named Marine112, to bring new challenges to the community. Marine112 is authentically suited for the class and domain incremental (CDI) learning setting and is under natural long-tail distribution. The results under three settings show that our ProTPS performs favorably against the recent state-of-the-art methods. The implementation code and Marine112 dataset will be released upon the acceptance of our paper.

Paper Structure

This paper contains 26 sections, 7 equations, 12 figures, 14 tables.

Figures (12)

  • Figure 1: Visualization of learned text prompts via Grad-CAM selvaraju2017grad. Highlighted image areas are the attended regions of the learned text prompt of each class. Each pair of adjacent columns, moving from left to right, represents the same class: "african hunting dog", "chihuahua", "borzoi", "little blue heron", "american coot".
  • Figure 2: Prior workzhou2022learning proposes learnable prompts to replace predefined prompt templates. Our work introduces a novel text prompt selection approach that enables semantic prompts of each class to interact with all other class names. Note that three big gray circles in the feature space represent all possible semantic features for each class.
  • Figure 3: ProTPS framework. In Task-1, we use the prototype $I_i$ of class $i$ to initialize the weights of the linear $classifier_{1}$, which is referred to as ProTPS's "vision classifier". We finetune these vision prototypes for refinement. Each class prototype $I_i$ is paired with a learnable text prompt $P_i$. For a single input image, we select a text prompt to concatenate with all class names in Task-1 to create $classifier_{2}$, which is referred to as the "text classifier" of ProTPS. In Task-2, prototypes and paired prompts are expanded. We freeze prototypes $I_1, ..., I_n$ and prompts $P_1, ..., P_n$ trained in the previous task. We propose a sampling method to create $sampled$$classifier_{2}$ to help new text prompts capture exclusive unique features for new classes. For the later tasks, the expansion rule is the same as Task-2.
  • Figure 4: Accuracy evolution of ProTPS's different classifiers on ImageNet100 deng2009imagenet under CI setting.
  • Figure 5: Marine112 captures both ecological dynamics (incremental domains and classes) and natural longtail distribution.
  • ...and 7 more figures