Table of Contents
Fetching ...

FeTT: Continual Class Incremental Learning via Feature Transformation Tuning

Sunyuan Qiang, Xuxin Lin, Yanyan Liang, Jun Wan, Du Zhang

TL;DR

FeTT tackles catastrophic forgetting in continual class-incremental learning by combining a fine-tune-then-freeze paradigm with a non-parametric feature transformation (FeTT) that non-intrusively reshapes backbone feature channels. The method initializes PEFT in the first task to adapt to downstream data, freezes the backbone thereafter, and updates class prototypes using FeTT-transformed features, with optional FeTT-E ensemble across PTMs. Experimental results across six datasets and 14 CL settings show consistent improvements in average and last accuracies, including strong CIFAR100 B0 Inc10 performance (~93% avg), and ablations confirm the utility of LogTrans and PwrTrans transformations and the ensemble strategy. The work highlights that training-free feature transformations can meaningfully reduce channel suppression and distribution mismatch, offering a plug-and-play enhancement for PTM-based continual learning with minimal data or parameter overhead, and points to future exploration with multimodal PTMs and plasticity-stability trade-offs.

Abstract

Continual learning (CL) aims to extend deep models from static and enclosed environments to dynamic and complex scenarios, enabling systems to continuously acquire new knowledge of novel categories without forgetting previously learned knowledge. Recent CL models have gradually shifted towards the utilization of pre-trained models (PTMs) with parameter-efficient fine-tuning (PEFT) strategies. However, continual fine-tuning still presents a serious challenge of catastrophic forgetting due to the absence of previous task data. Additionally, the fine-tune-then-frozen mechanism suffers from performance limitations due to feature channels suppression and insufficient training data in the first CL task. To this end, this paper proposes feature transformation tuning (FeTT) model to non-parametrically fine-tune backbone features across all tasks, which not only operates independently of CL training data but also smooths feature channels to prevent excessive suppression. Then, the extended ensemble strategy incorporating different PTMs with FeTT model facilitates further performance improvement. We further elaborate on the discussions of the fine-tune-then-frozen paradigm and the FeTT model from the perspectives of discrepancy in class marginal distributions and feature channels. Extensive experiments on CL benchmarks validate the effectiveness of our proposed method.

FeTT: Continual Class Incremental Learning via Feature Transformation Tuning

TL;DR

FeTT tackles catastrophic forgetting in continual class-incremental learning by combining a fine-tune-then-freeze paradigm with a non-parametric feature transformation (FeTT) that non-intrusively reshapes backbone feature channels. The method initializes PEFT in the first task to adapt to downstream data, freezes the backbone thereafter, and updates class prototypes using FeTT-transformed features, with optional FeTT-E ensemble across PTMs. Experimental results across six datasets and 14 CL settings show consistent improvements in average and last accuracies, including strong CIFAR100 B0 Inc10 performance (~93% avg), and ablations confirm the utility of LogTrans and PwrTrans transformations and the ensemble strategy. The work highlights that training-free feature transformations can meaningfully reduce channel suppression and distribution mismatch, offering a plug-and-play enhancement for PTM-based continual learning with minimal data or parameter overhead, and points to future exploration with multimodal PTMs and plasticity-stability trade-offs.

Abstract

Continual learning (CL) aims to extend deep models from static and enclosed environments to dynamic and complex scenarios, enabling systems to continuously acquire new knowledge of novel categories without forgetting previously learned knowledge. Recent CL models have gradually shifted towards the utilization of pre-trained models (PTMs) with parameter-efficient fine-tuning (PEFT) strategies. However, continual fine-tuning still presents a serious challenge of catastrophic forgetting due to the absence of previous task data. Additionally, the fine-tune-then-frozen mechanism suffers from performance limitations due to feature channels suppression and insufficient training data in the first CL task. To this end, this paper proposes feature transformation tuning (FeTT) model to non-parametrically fine-tune backbone features across all tasks, which not only operates independently of CL training data but also smooths feature channels to prevent excessive suppression. Then, the extended ensemble strategy incorporating different PTMs with FeTT model facilitates further performance improvement. We further elaborate on the discussions of the fine-tune-then-frozen paradigm and the FeTT model from the perspectives of discrepancy in class marginal distributions and feature channels. Extensive experiments on CL benchmarks validate the effectiveness of our proposed method.
Paper Structure (23 sections, 15 equations, 7 figures, 6 tables)

This paper contains 23 sections, 15 equations, 7 figures, 6 tables.

Figures (7)

  • Figure 1: Left: Due to the mismatch between the distributions of training and testing data, continual fine-tuning (FT and FT-Adapter) lead to a serious dilemma of catastrophic forgetting. The fine-tuning and then freezing paradigm (Simp, ADAM, and Ours) demonstrates superior performance. Right: PEFT requires more training data (measured by number of classes) to achieve higher performance. Our method, on the other hand, can enhance its performance without necessitating extra data or training costs.
  • Figure 2: The overall architecture of our proposed model. During the first task, the model applies the PEFT strategies for adapting on downstream tasks. Then, in subsequent incremental tasks (including the first task), the fine-tuned model is frozen and concatenated with the original PTM to update the class prototypes via feature transformations. Feature transformations employ non-parametric functions to modulate backbone features, whereas prototypes are computed by averaging the sample features within the same category.
  • Figure 3: The activation frequency of feature embeddings of Adapter based PEFT baseline model and our proposed FeTT model on CIFAR dataset, including fine-tuned feature data from the first task and coming new feature data from the last task. Channels are sorted in a descending order of activation frequency of first task samples. We additionally include a line plot depicting the moving average of the last task feature activations for better comparative visualization.
  • Figure 4: Performance comparison of each step. Our proposed FeTT model directly select the best results for comparison among various parameter-efficient fine-tuning (PEFT) strategies.
  • Figure 5: Performance comparison of each step. Our proposed FeTT model directly select the best results for comparison among various parameter-efficient fine-tuning (PEFT) strategies.
  • ...and 2 more figures