Table of Contents
Fetching ...

Few-Shot Class Incremental Learning with Attention-Aware Self-Adaptive Prompt

Chenxi Liu, Zhenyi Wang, Tianyi Xiong, Ruibo Chen, Yihan Wu, Junfeng Guo, Heng Huang

TL;DR

This work tackles Few-Shot Class-Incremental Learning by freezing a pretrained Vision Transformer backbone and introducing a prompt-based framework. It splits prompts into attention-aware task-invariant prompts (TIP) and self-adaptive task-specific prompts (TSP), with TIP enforcing consistent attention to reduce task-specific information and TSP generated through a prompt encoder guided by an Information Bottleneck objective. An EMA based p_avg and an anchor loss further promote generalization and discriminability, enabling effective knowledge transfer from base to new classes without rehearsal buffers. Empirical results on CIFAR100, CUB200-2011, and ImageNet-R demonstrate that ASP consistently outperforms state-of-the-art FSCIL and prompt-based CIL methods in learning new classes while halving forgetting on base classes. The approach provides a data-efficient, scalable solution for continual learning in vision tasks using fixed backbones and learned prompts.

Abstract

Few-Shot Class-Incremental Learning (FSCIL) models aim to incrementally learn new classes with scarce samples while preserving knowledge of old ones. Existing FSCIL methods usually fine-tune the entire backbone, leading to overfitting and hindering the potential to learn new classes. On the other hand, recent prompt-based CIL approaches alleviate forgetting by training prompts with sufficient data in each task. In this work, we propose a novel framework named Attention-aware Self-adaptive Prompt (ASP). ASP encourages task-invariant prompts to capture shared knowledge by reducing specific information from the attention aspect. Additionally, self-adaptive task-specific prompts in ASP provide specific information and transfer knowledge from old classes to new classes with an Information Bottleneck learning objective. In summary, ASP prevents overfitting on base task and does not require enormous data in few-shot incremental tasks. Extensive experiments on three benchmark datasets validate that ASP consistently outperforms state-of-the-art FSCIL and prompt-based CIL methods in terms of both learning new classes and mitigating forgetting.

Few-Shot Class Incremental Learning with Attention-Aware Self-Adaptive Prompt

TL;DR

This work tackles Few-Shot Class-Incremental Learning by freezing a pretrained Vision Transformer backbone and introducing a prompt-based framework. It splits prompts into attention-aware task-invariant prompts (TIP) and self-adaptive task-specific prompts (TSP), with TIP enforcing consistent attention to reduce task-specific information and TSP generated through a prompt encoder guided by an Information Bottleneck objective. An EMA based p_avg and an anchor loss further promote generalization and discriminability, enabling effective knowledge transfer from base to new classes without rehearsal buffers. Empirical results on CIFAR100, CUB200-2011, and ImageNet-R demonstrate that ASP consistently outperforms state-of-the-art FSCIL and prompt-based CIL methods in learning new classes while halving forgetting on base classes. The approach provides a data-efficient, scalable solution for continual learning in vision tasks using fixed backbones and learned prompts.

Abstract

Few-Shot Class-Incremental Learning (FSCIL) models aim to incrementally learn new classes with scarce samples while preserving knowledge of old ones. Existing FSCIL methods usually fine-tune the entire backbone, leading to overfitting and hindering the potential to learn new classes. On the other hand, recent prompt-based CIL approaches alleviate forgetting by training prompts with sufficient data in each task. In this work, we propose a novel framework named Attention-aware Self-adaptive Prompt (ASP). ASP encourages task-invariant prompts to capture shared knowledge by reducing specific information from the attention aspect. Additionally, self-adaptive task-specific prompts in ASP provide specific information and transfer knowledge from old classes to new classes with an Information Bottleneck learning objective. In summary, ASP prevents overfitting on base task and does not require enormous data in few-shot incremental tasks. Extensive experiments on three benchmark datasets validate that ASP consistently outperforms state-of-the-art FSCIL and prompt-based CIL methods in terms of both learning new classes and mitigating forgetting.
Paper Structure (15 sections, 17 equations, 5 figures, 4 tables)

This paper contains 15 sections, 17 equations, 5 figures, 4 tables.

Figures (5)

  • Figure 1: Overall training scheme of ASP for the base task. For incremental tasks where $t>0$, ASP updates only $\boldsymbol{p}_{avg}$ using \ref{['eq:EMA']}. Left: The pre-trained backbone remains frozen during training and the prompts are inserted into layers between attention blocks. The TIP$\boldsymbol{p}_I$ are initialized from an attention-aware aspect, while the TSP$\boldsymbol{p}_S$ are derived from the prompt encoder $E_{\boldsymbol{p}}$. At the beginning of each training epoch, anchor images are selected using \ref{['eq:anchor_sample']}. Throughout the training, the prompts and $E_{\boldsymbol{p}}$ are optimized using IB loss and Anchor loss as specified in \ref{['eq:overall_loss']}. Right: Details of the prompt encoder $E_{\boldsymbol{p}}$. Image features extracted by the frozen pre-trained backbone are fed to two tiny networks $f_\mu$ and $f_\Sigma$. At the start of each training epoch, $p_{avg}$ is calculated via \ref{['eq:p_avg']} using $\boldsymbol{p}_I$ and all data in the base task. Within a training epoch, the output of $f_\Sigma$ contributes to IB loss, while $\boldsymbol{p}_S$ results from blending $p_{avg}$ and the output of $f_\mu$, as outlined in \ref{['eq:ps']}.
  • Figure 2: Attention on task-invariant prompts between different tasks. A deeper line color indicates greater attention. Left: When initialized with different values, attention on each prompt differs across tasks, thereby providing inconsistent information. Right:ASP initializes TIP with the same values, ensuring consistent attention across tasks and providing uniform information.
  • Figure 3: Detailed Top-1 accuracy $A_t$ in each incremental task on three benchmark datasets. ASP outperforms baselines in most tasks.
  • Figure 4: Sensitive analysis of $\alpha$, $\beta$, $\lambda$ and prompt length $L_g=L_d$. The average accuracy $A_{avg}$ is reported on ImageNet-R datasets.
  • Figure 5: Comparison of baselines and ASP on detailed accuracy of base and new classes after the last task. ASP outperforms all baselines in terms of learning new classes while achieving competitive performance in maintaining performance on base classes.