Table of Contents
Fetching ...

Adaptive Additive Parameter Updates of Vision Transformers for Few-Shot Continual Learning

Kyle Stein, Andrew Arash Mahyari, Guillermo Francia, Eman El-Sheikh

TL;DR

This work tackles catastrophic forgetting in few-shot class-incremental learning (FSCIL) by freezing a pre-trained Vision Transformer (ViT) backbone and introducing additive updates to the self-attention projections, shared across transformer blocks, to adapt to new classes from few examples. A classifier recalibration step (via learning and updating class prototypes μ_j) complements the approach, enabling robust base representations to be retained while integrating novel classes with minimal data. The method achieves state-of-the-art results on CUB-200, CIFAR-100, and miniImageNet, with substantial improvements in base and final session accuracies and reduced forgetting, while avoiding the overhead of prompt-based PEFT methods. Ablation studies show that updating more self-attention layers yields larger gains and that self-attention updates outperform MLP updates, underscoring the efficiency and effectiveness of targeted, additive fine-tuning in ViTs for FSCIL.

Abstract

Integrating new class information without losing previously acquired knowledge remains a central challenge in artificial intelligence, often referred to as catastrophic forgetting. Few-shot class incremental learning (FSCIL) addresses this by first training a model on a robust dataset of base classes and then incrementally adapting it in successive sessions using only a few labeled examples per novel class. However, this approach is prone to overfitting on the limited new data, which can compromise overall performance and exacerbate forgetting. In this work, we propose a simple yet effective novel FSCIL framework that leverages a frozen Vision Transformer (ViT) backbone augmented with parameter-efficient additive updates. Our approach freezes the pre-trained ViT parameters and selectively injects trainable weights into the self-attention modules via an additive update mechanism. This design updates only a small subset of parameters to accommodate new classes without sacrificing the representations learned during the base session. By fine-tuning a limited number of parameters, our method preserves the generalizable features in the frozen ViT while reducing the risk of overfitting. Furthermore, as most parameters remain fixed, the model avoids overwriting previously learned knowledge when small novel data batches are introduced. Extensive experiments on benchmark datasets demonstrate that our approach yields state-of-the-art performance compared to baseline FSCIL methods.

Adaptive Additive Parameter Updates of Vision Transformers for Few-Shot Continual Learning

TL;DR

This work tackles catastrophic forgetting in few-shot class-incremental learning (FSCIL) by freezing a pre-trained Vision Transformer (ViT) backbone and introducing additive updates to the self-attention projections, shared across transformer blocks, to adapt to new classes from few examples. A classifier recalibration step (via learning and updating class prototypes μ_j) complements the approach, enabling robust base representations to be retained while integrating novel classes with minimal data. The method achieves state-of-the-art results on CUB-200, CIFAR-100, and miniImageNet, with substantial improvements in base and final session accuracies and reduced forgetting, while avoiding the overhead of prompt-based PEFT methods. Ablation studies show that updating more self-attention layers yields larger gains and that self-attention updates outperform MLP updates, underscoring the efficiency and effectiveness of targeted, additive fine-tuning in ViTs for FSCIL.

Abstract

Integrating new class information without losing previously acquired knowledge remains a central challenge in artificial intelligence, often referred to as catastrophic forgetting. Few-shot class incremental learning (FSCIL) addresses this by first training a model on a robust dataset of base classes and then incrementally adapting it in successive sessions using only a few labeled examples per novel class. However, this approach is prone to overfitting on the limited new data, which can compromise overall performance and exacerbate forgetting. In this work, we propose a simple yet effective novel FSCIL framework that leverages a frozen Vision Transformer (ViT) backbone augmented with parameter-efficient additive updates. Our approach freezes the pre-trained ViT parameters and selectively injects trainable weights into the self-attention modules via an additive update mechanism. This design updates only a small subset of parameters to accommodate new classes without sacrificing the representations learned during the base session. By fine-tuning a limited number of parameters, our method preserves the generalizable features in the frozen ViT while reducing the risk of overfitting. Furthermore, as most parameters remain fixed, the model avoids overwriting previously learned knowledge when small novel data batches are introduced. Extensive experiments on benchmark datasets demonstrate that our approach yields state-of-the-art performance compared to baseline FSCIL methods.

Paper Structure

This paper contains 13 sections, 8 equations, 2 figures, 4 tables.

Figures (2)

  • Figure 1: The overall architecture of the proposed FSCIL approach.
  • Figure 2: miniImageNet performance across sessions.