Table of Contents
Fetching ...

Dual Prototypes for Adaptive Pre-Trained Model in Class-Incremental Learning

Zhiming Xu, Suorong Yang, Baile Xu, Furao Shen, Jian Zhao

TL;DR

The paper tackles catastrophic forgetting in class-incremental learning by freezing pre-trained transformers and introducing a per-task adapter system trained with a Center-Adapt loss. It adds a dual-prototype classifier that uses raw prototypes for reliable top-K candidate labels and augmented prototypes for final refinement, enabling test-time adapter selection without exhaustively loading all adapters. Empirical results show state-of-the-art or competitive performance across diverse benchmarks, with notable gains on VTAB and strong exemplar-free performance, while analyses reveal limitations on low-resolution data and latency trade-offs. The approach offers a flexible, plug-and-play framework for PTM-based CIL with efficient storage and inference characteristics.

Abstract

Class-incremental learning (CIL) aims to learn new classes while retaining previous knowledge. Although pre-trained model (PTM) based approaches show strong performance, directly fine-tuning PTMs on incremental task streams often causes renewed catastrophic forgetting. This paper proposes a Dual-Prototype Network with Task-wise Adaptation (DPTA) for PTM-based CIL. For each incremental learning task, an adapter module is built to fine-tune the PTM, where the center-adapt loss forces the representation to be more centrally clustered and class separable. The dual prototype network improves the prediction process by enabling test-time adapter selection, where the raw prototypes deduce several possible task indexes of test samples to select suitable adapter modules for PTM, and the augmented prototypes that could separate confusable classes are utilized to determine the final result. Experiments on multiple benchmarks show that DPTA consistently surpasses recent methods by 1\% - 5\%. Notably, on the VTAB dataset, it achieves approximately 3\% improvement over state-of-the-art methods. The code is open-sourced in https://github.com/Yorkxzm/DPTA}

Dual Prototypes for Adaptive Pre-Trained Model in Class-Incremental Learning

TL;DR

The paper tackles catastrophic forgetting in class-incremental learning by freezing pre-trained transformers and introducing a per-task adapter system trained with a Center-Adapt loss. It adds a dual-prototype classifier that uses raw prototypes for reliable top-K candidate labels and augmented prototypes for final refinement, enabling test-time adapter selection without exhaustively loading all adapters. Empirical results show state-of-the-art or competitive performance across diverse benchmarks, with notable gains on VTAB and strong exemplar-free performance, while analyses reveal limitations on low-resolution data and latency trade-offs. The approach offers a flexible, plug-and-play framework for PTM-based CIL with efficient storage and inference characteristics.

Abstract

Class-incremental learning (CIL) aims to learn new classes while retaining previous knowledge. Although pre-trained model (PTM) based approaches show strong performance, directly fine-tuning PTMs on incremental task streams often causes renewed catastrophic forgetting. This paper proposes a Dual-Prototype Network with Task-wise Adaptation (DPTA) for PTM-based CIL. For each incremental learning task, an adapter module is built to fine-tune the PTM, where the center-adapt loss forces the representation to be more centrally clustered and class separable. The dual prototype network improves the prediction process by enabling test-time adapter selection, where the raw prototypes deduce several possible task indexes of test samples to select suitable adapter modules for PTM, and the augmented prototypes that could separate confusable classes are utilized to determine the final result. Experiments on multiple benchmarks show that DPTA consistently surpasses recent methods by 1\% - 5\%. Notably, on the VTAB dataset, it achieves approximately 3\% improvement over state-of-the-art methods. The code is open-sourced in https://github.com/Yorkxzm/DPTA}

Paper Structure

This paper contains 20 sections, 18 equations, 9 figures, 6 tables, 2 algorithms.

Figures (9)

  • Figure 1: Overview of the proposed DPTA. Left: Training. When a new task $i$ arrives, a task-specific adapter is fine-tuned and saved. Then, using the raw and task-adapted PTM to construct raw and augmented prototypes. Right: Test. Raw prototypes produce top-K candidate labels, from which the relevant task adapters are identified. The augmented prototypes of the selected adapters then predict the final label.
  • Figure 2: t-SNE van2008visualizing visualizations of original space and task-adapted subspace trained with CE loss and CA loss.
  • Figure 3: The comparison of NCM top-1 and top-5 predictions accuracy, where the accuracy of the top-5 group ranged from 98% to 100%. The prototypes were built with a pre-trained VIT-B/16-IN21K without fine-tuning.
  • Figure 4: t-SNE visualizations in a task-adapted subspace trained with CA loss.
  • Figure 5: The comparison of accuracy and trainable parameter sizes on the ImageNet-R dataset.
  • ...and 4 more figures