FeTT: Continual Class Incremental Learning via Feature Transformation Tuning
Sunyuan Qiang, Xuxin Lin, Yanyan Liang, Jun Wan, Du Zhang
TL;DR
FeTT tackles catastrophic forgetting in continual class-incremental learning by combining a fine-tune-then-freeze paradigm with a non-parametric feature transformation (FeTT) that non-intrusively reshapes backbone feature channels. The method initializes PEFT in the first task to adapt to downstream data, freezes the backbone thereafter, and updates class prototypes using FeTT-transformed features, with optional FeTT-E ensemble across PTMs. Experimental results across six datasets and 14 CL settings show consistent improvements in average and last accuracies, including strong CIFAR100 B0 Inc10 performance (~93% avg), and ablations confirm the utility of LogTrans and PwrTrans transformations and the ensemble strategy. The work highlights that training-free feature transformations can meaningfully reduce channel suppression and distribution mismatch, offering a plug-and-play enhancement for PTM-based continual learning with minimal data or parameter overhead, and points to future exploration with multimodal PTMs and plasticity-stability trade-offs.
Abstract
Continual learning (CL) aims to extend deep models from static and enclosed environments to dynamic and complex scenarios, enabling systems to continuously acquire new knowledge of novel categories without forgetting previously learned knowledge. Recent CL models have gradually shifted towards the utilization of pre-trained models (PTMs) with parameter-efficient fine-tuning (PEFT) strategies. However, continual fine-tuning still presents a serious challenge of catastrophic forgetting due to the absence of previous task data. Additionally, the fine-tune-then-frozen mechanism suffers from performance limitations due to feature channels suppression and insufficient training data in the first CL task. To this end, this paper proposes feature transformation tuning (FeTT) model to non-parametrically fine-tune backbone features across all tasks, which not only operates independently of CL training data but also smooths feature channels to prevent excessive suppression. Then, the extended ensemble strategy incorporating different PTMs with FeTT model facilitates further performance improvement. We further elaborate on the discussions of the fine-tune-then-frozen paradigm and the FeTT model from the perspectives of discrepancy in class marginal distributions and feature channels. Extensive experiments on CL benchmarks validate the effectiveness of our proposed method.
