Adaptive Adapter Routing for Long-Tailed Class-Incremental Learning

Zhi-Hong Qi; Da-Wei Zhou; Yiran Yao; Han-Jia Ye; De-Chuan Zhan

Adaptive Adapter Routing for Long-Tailed Class-Incremental Learning

Zhi-Hong Qi, Da-Wei Zhou, Yiran Yao, Han-Jia Ye, De-Chuan Zhan

TL;DR

The paper tackles long-tailed class-incremental learning under exemplar-free constraints by leveraging pre-trained Vision Transformers with adaptive adapters. APART introduces two adapter pools (one auxiliary for minority classes) and learns instance-specific routing weights $w(\mathbf{x},y)$ to guide when to apply the auxiliary loss, enabling comprehensive representation across all classes. Training combines losses from the main and auxiliary pools, weighted by the adaptive router, and inference ensembles logits from both pools for improved robustness. Experiments on CIFAR100, ImageNet-R, and ObjectNet show state-of-the-art performance with favorable memory usage, and ablations confirm the necessity and effectiveness of the adaptive routing and auxiliary pool components.

Abstract

In our ever-evolving world, new data exhibits a long-tailed distribution, such as e-commerce platform reviews. This necessitates continuous model learning imbalanced data without forgetting, addressing the challenge of long-tailed class-incremental learning (LTCIL). Existing methods often rely on retraining linear classifiers with former data, which is impractical in real-world settings. In this paper, we harness the potent representation capabilities of pre-trained models and introduce AdaPtive Adapter RouTing (APART) as an exemplar-free solution for LTCIL. To counteract forgetting, we train inserted adapters with frozen pre-trained weights for deeper adaptation and maintain a pool of adapters for selection during sequential model updates. Additionally, we present an auxiliary adapter pool designed for effective generalization, especially on minority classes. Adaptive instance routing across these pools captures crucial correlations, facilitating a comprehensive representation of all classes. Consequently, APART tackles the imbalance problem as well as catastrophic forgetting in a unified framework. Extensive benchmark experiments validate the effectiveness of APART. Code is available at: https://github.com/vita-qzh/APART

Adaptive Adapter Routing for Long-Tailed Class-Incremental Learning

TL;DR

to guide when to apply the auxiliary loss, enabling comprehensive representation across all classes. Training combines losses from the main and auxiliary pools, weighted by the adaptive router, and inference ensembles logits from both pools for improved robustness. Experiments on CIFAR100, ImageNet-R, and ObjectNet show state-of-the-art performance with favorable memory usage, and ablations confirm the necessity and effectiveness of the adaptive routing and auxiliary pool components.

Abstract

Paper Structure (21 sections, 9 equations, 4 figures, 8 tables, 1 algorithm)

This paper contains 21 sections, 9 equations, 4 figures, 8 tables, 1 algorithm.

Introduction
Related Work
Preliminaries
Long-Tailed Class-Incremental Learning
Pre-Trained Models for CIL
Adaptive Adapter Routing for LTCIL
Auxiliary Adapter Pool
Adaptive Routing
Summary of Apart
Experiment
Implementation Details
Benchmark Comparison
Ablation Study
Further Analysis
Conclusion
...and 6 more sections

Figures (4)

Figure 1: Demonstration of Apart. To make the model comprehensive, an auxiliary adapter pool is learned for minority classes and instance-wise routing is learned adaptively. To make the model provident, multiple adapters form a pool to enlarge the capacity of fine-tuned model. The objective is to learn an automatic routing to learn effectively from minority classes without forgetting.
Figure 2: Incremental performance when starting from half of the total classes in shuffled LTCIL. We show the legends in (c). Apart consistently outperforms other compared methods.
Figure 3: Adaptively learned weights showing in the view of instance and frequency in shuffled LTCIL. We show the legends in (b). Weights show a diversity among classes and a decrease with the increase in frequency.
Figure 4: Origin distribution and long-tailed distribution after sampling for each dataset. We show the legends in (a). 'Origin' and 'Long-tailed' denote the distribution before and after sampling separately.

Adaptive Adapter Routing for Long-Tailed Class-Incremental Learning

TL;DR

Abstract

Adaptive Adapter Routing for Long-Tailed Class-Incremental Learning

Authors

TL;DR

Abstract

Table of Contents

Figures (4)