Table of Contents
Fetching ...

Singular Value Fine-tuning for Few-Shot Class-Incremental Learning

Zhiwu Wang, Yichen Wu, Renzhen Wang, Haokun Lin, Quanziang Wang, Qian Zhao, Deyu Meng

TL;DR

FSCIL combines continual learning with extreme data scarcity for new classes, making overfitting a critical risk when using large foundation models. The paper introduces SVFCL, a singular-value-based fine-tuning method that freezes the SVD bases $\mathbf{U}$ and $\mathbf{V}$ of pretrained weights and only learns to adjust and merge the singular values across incremental tasks, yielding updates $\Delta W = \mathbf{U} \mathcal{M}(\{\Delta\Sigma_i\}_{i=0}^{t-1}, \Delta\Sigma_t) \mathbf{V}^\top$. Theoretical and empirical analysis shows SVFCL uses far fewer trainable parameters and concentrates updates along principal components, reducing overfitting while maintaining strong knowledge retention, outperforming prompt-tuning and LoRA-based baselines on miniImageNet, ImageNet-R, and CUB200-2011. Ablation studies confirm the importance of freezing $\mathbf{U}$ and $\mathbf{V}$, the choice of blocks to fine-tune in ViT, and the efficacy of low-rank SVD approximations. Overall, SVFCL provides a simple, effective, and scalable approach for FSCIL with foundation models, achieving state-of-the-art results and robust generalization to distribution shifts.

Abstract

Class-Incremental Learning (CIL) aims to prevent catastrophic forgetting of previously learned classes while sequentially incorporating new ones. The more challenging Few-shot CIL (FSCIL) setting further complicates this by providing only a limited number of samples for each new class, increasing the risk of overfitting in addition to standard CIL challenges. While catastrophic forgetting has been extensively studied, overfitting in FSCIL, especially with large foundation models, has received less attention. To fill this gap, we propose the Singular Value Fine-tuning for FSCIL (SVFCL) and compared it with existing approaches for adapting foundation models to FSCIL, which primarily build on Parameter Efficient Fine-Tuning (PEFT) methods like prompt tuning and Low-Rank Adaptation (LoRA). Specifically, SVFCL applies singular value decomposition to the foundation model weights, keeping the singular vectors fixed while fine-tuning the singular values for each task, and then merging them. This simple yet effective approach not only alleviates the forgetting problem but also mitigates overfitting more effectively while significantly reducing trainable parameters. Extensive experiments on four benchmark datasets, along with visualizations and ablation studies, validate the effectiveness of SVFCL. The code will be made available.

Singular Value Fine-tuning for Few-Shot Class-Incremental Learning

TL;DR

FSCIL combines continual learning with extreme data scarcity for new classes, making overfitting a critical risk when using large foundation models. The paper introduces SVFCL, a singular-value-based fine-tuning method that freezes the SVD bases and of pretrained weights and only learns to adjust and merge the singular values across incremental tasks, yielding updates . Theoretical and empirical analysis shows SVFCL uses far fewer trainable parameters and concentrates updates along principal components, reducing overfitting while maintaining strong knowledge retention, outperforming prompt-tuning and LoRA-based baselines on miniImageNet, ImageNet-R, and CUB200-2011. Ablation studies confirm the importance of freezing and , the choice of blocks to fine-tune in ViT, and the efficacy of low-rank SVD approximations. Overall, SVFCL provides a simple, effective, and scalable approach for FSCIL with foundation models, achieving state-of-the-art results and robust generalization to distribution shifts.

Abstract

Class-Incremental Learning (CIL) aims to prevent catastrophic forgetting of previously learned classes while sequentially incorporating new ones. The more challenging Few-shot CIL (FSCIL) setting further complicates this by providing only a limited number of samples for each new class, increasing the risk of overfitting in addition to standard CIL challenges. While catastrophic forgetting has been extensively studied, overfitting in FSCIL, especially with large foundation models, has received less attention. To fill this gap, we propose the Singular Value Fine-tuning for FSCIL (SVFCL) and compared it with existing approaches for adapting foundation models to FSCIL, which primarily build on Parameter Efficient Fine-Tuning (PEFT) methods like prompt tuning and Low-Rank Adaptation (LoRA). Specifically, SVFCL applies singular value decomposition to the foundation model weights, keeping the singular vectors fixed while fine-tuning the singular values for each task, and then merging them. This simple yet effective approach not only alleviates the forgetting problem but also mitigates overfitting more effectively while significantly reducing trainable parameters. Extensive experiments on four benchmark datasets, along with visualizations and ablation studies, validate the effectiveness of SVFCL. The code will be made available.

Paper Structure

This paper contains 12 sections, 2 theorems, 11 equations, 8 figures, 4 tables, 1 algorithm.

Key Result

Theorem 1

Let $\mathbf{W}\in\mathbb{R}^{m\times n}$ be the pre-trained weight matrix, and $\mathbf{W}^*\!\in\! \mathbb{R}^{d\times k}$ be the optimal weight matrix for both the current task and previous ones. Without loss of generality, assume the rank parameters $r_1=r_2=r$. Suppose $\Delta \mathbf{W}^*=\mat

Figures (8)

  • Figure 1: Illustration of attention maps on one sampled CIFAR-100 image across four methods during FSCIL. Our proposed approach can effectively focus on critical features while maintaining robustness. In contrast, the full fine-tuning strategy struggles to capture key attention patterns, whereas the other two representative methods, InfLoRA liang2024inflora and L2P wang2022l2p, tend to emphasize background features partially.
  • Figure 2: Illustration of the accuracy curve on both the training and validation datasets across three approaches during the first incremental few-shot session. InfLoRA liang2024inflora and L2P wang2022l2p face a serious risk of overfitting, while the proposed SVFCL demonstrates a strong ability to mitigate overfitting.
  • Figure 3: The framework of the proposed SVFCL algorithm. (a) We first perform singular value decomposition on the pre-trained weights $\mathbf W$ and obtain the fixed singular matrices $\mathbf{U}$ and $\mathbf{V}$. (b) We incrementally fine-tune the singular values on the current training task and get the singular value shift $\Delta \mathbf{\Sigma}_t$. The notation $\mathcal{M}(\cdot)$ denote the merging function to fuse all learned singular shifts as shown in Eqn. (\ref{['eq:merging']}).
  • Figure 4: Illustration of Top-1 accuracy curves during sequential training on the miniImageNet, ImageNet-R, and CUB200-2011 datasets.
  • Figure 5: Ablation study of fine-tuning different blocks within ViT.
  • ...and 3 more figures

Theorems & Definitions (3)

  • Theorem 1: Optimization Stability.
  • Theorem 2: Optimization Stability.
  • proof