Table of Contents
Fetching ...

Semantic Prompting with Image-Token for Continual Learning

Jisu Han, Jaemin Na, Wonjun Hwang

TL;DR

This work introduces a novel task-agnostic approach that focuses on the visual semantic information of image tokens eliminating the preceding task prediction and achieves competitive performance on four benchmarks while significantly reducing training time compared to state-of-the-art methods.

Abstract

Continual learning aims to refine model parameters for new tasks while retaining knowledge from previous tasks. Recently, prompt-based learning has emerged to leverage pre-trained models to be prompted to learn subsequent tasks without the reliance on the rehearsal buffer. Although this approach has demonstrated outstanding results, existing methods depend on preceding task-selection process to choose appropriate prompts. However, imperfectness in task-selection may lead to negative impacts on the performance particularly in the scenarios where the number of tasks is large or task distributions are imbalanced. To address this issue, we introduce I-Prompt, a task-agnostic approach focuses on the visual semantic information of image tokens to eliminate task prediction. Our method consists of semantic prompt matching, which determines prompts based on similarities between tokens, and image token-level prompting, which applies prompts directly to image tokens in the intermediate layers. Consequently, our method achieves competitive performance on four benchmarks while significantly reducing training time compared to state-of-the-art methods. Moreover, we demonstrate the superiority of our method across various scenarios through extensive experiments.

Semantic Prompting with Image-Token for Continual Learning

TL;DR

This work introduces a novel task-agnostic approach that focuses on the visual semantic information of image tokens eliminating the preceding task prediction and achieves competitive performance on four benchmarks while significantly reducing training time compared to state-of-the-art methods.

Abstract

Continual learning aims to refine model parameters for new tasks while retaining knowledge from previous tasks. Recently, prompt-based learning has emerged to leverage pre-trained models to be prompted to learn subsequent tasks without the reliance on the rehearsal buffer. Although this approach has demonstrated outstanding results, existing methods depend on preceding task-selection process to choose appropriate prompts. However, imperfectness in task-selection may lead to negative impacts on the performance particularly in the scenarios where the number of tasks is large or task distributions are imbalanced. To address this issue, we introduce I-Prompt, a task-agnostic approach focuses on the visual semantic information of image tokens to eliminate task prediction. Our method consists of semantic prompt matching, which determines prompts based on similarities between tokens, and image token-level prompting, which applies prompts directly to image tokens in the intermediate layers. Consequently, our method achieves competitive performance on four benchmarks while significantly reducing training time compared to state-of-the-art methods. Moreover, we demonstrate the superiority of our method across various scenarios through extensive experiments.
Paper Structure (12 sections, 9 equations, 5 figures, 6 tables)

This paper contains 12 sections, 9 equations, 5 figures, 6 tables.

Figures (5)

  • Figure 1: Overview of the prompt-based continual learning approaches. (a) Previous approaches have selected prompts based on the output class token of the pre-trained model and their similarity to task-specific prompts, accompanied by a task selection process. If an aircraft image is properly allocated to task T, which involves aircraft, accurate inference can be expected. However, if it is assigned to task 1, it leads to forgetting due to the inconsistency between training and inference. (b) Our approach eliminates these erroneous task-selection process and focuses on semantic information within image itself to assign prompts that are relevant to the image. We exploit the information of relationships between image-tokens through the representational capability of the pre-trained model.
  • Figure 2: Schematic illustration of I-Prompt. We describe the details in internal structure of the transformer layer and its interaction with the prompt pool. Above: Transformer layer internal process. The attention key $h_k$ is passed to the prompt pool. Each prompt selected from the prompt pool is added to the transformer's attention key and value, and only the prompt is trained to adapt to new tasks. Below: The process of matching prompts in the prompt pool. The similarity between the input attention key from the transformer layer and the prompt key is calculated, and the final prompt is determined by the element-wise product of the calculated similarity and the prompt.
  • Figure 3: Performance on various task distribution. We report the final accuracy in the uniform setting, representing the task-balanced scenario, and in three distinct task-imbalanced scenarios.
  • Figure 4: Task-wise accuracy in random increase scenario. We report the performance of the prompt-based method in a random increase scenario. The line plot and bar plot show the average accuracy and the distribution of classes per task, respectively.
  • Figure 5: Hyperparameter analysis. Result of the grid search for prompt pool size and length, Left: average accuracy $\uparrow$ (%), Right: tuning parameter ratio $\downarrow$ (%).