Table of Contents
Fetching ...

DUKAE: DUal-level Knowledge Accumulation and Ensemble for Pre-Trained Model-Based Continual Learning

Songze Li, Tonghua Su, Xu-Yao Zhang, Qixing Xu, Zhongjie Wang

TL;DR

This paper addresses catastrophic forgetting in pre-trained model-based continual learning by introducing DUKAE, which combines dual-level knowledge accumulation and an adaptive expertise ensemble. Feature-level accumulation builds task-specific PEFT modules with SSL to enrich representations, while decision-level accumulation aligns subspace classifiers via Gaussian distributions and updates them across tasks. An adaptive ensemble then integrates outputs from multiple subspaces to exploit domain-specific expertise and mitigate inter-subspace interference, achieving state-of-the-art results on CIFAR-100, ImageNet-R, CUB-200, and Cars-196. The approach offers practical benefits for rapid knowledge integration in PTMs, with potential extensions to federated continual learning, albeit with storage considerations for distribution parameters across many subspaces.

Abstract

Pre-trained model-based continual learning (PTMCL) has garnered growing attention, as it enables more rapid acquisition of new knowledge by leveraging the extensive foundational understanding inherent in pre-trained model (PTM). Most existing PTMCL methods use Parameter-Efficient Fine-Tuning (PEFT) to learn new knowledge while consolidating existing memory. However, they often face some challenges. A major challenge lies in the misalignment of classification heads, as the classification head of each task is trained within a distinct feature space, leading to inconsistent decision boundaries across tasks and, consequently, increased forgetting. Another critical limitation stems from the restricted feature-level knowledge accumulation, with feature learning typically restricted to the initial task only, which constrains the model's representation capabilities. To address these issues, we propose a method named DUal-level Knowledge Accumulation and Ensemble (DUKAE) that leverages both feature-level and decision-level knowledge accumulation by aligning classification heads into a unified feature space through Gaussian distribution sampling and introducing an adaptive expertise ensemble to fuse knowledge across feature subspaces. Extensive experiments on CIFAR-100, ImageNet-R, CUB-200, and Cars-196 datasets demonstrate the superior performance of our approach.

DUKAE: DUal-level Knowledge Accumulation and Ensemble for Pre-Trained Model-Based Continual Learning

TL;DR

This paper addresses catastrophic forgetting in pre-trained model-based continual learning by introducing DUKAE, which combines dual-level knowledge accumulation and an adaptive expertise ensemble. Feature-level accumulation builds task-specific PEFT modules with SSL to enrich representations, while decision-level accumulation aligns subspace classifiers via Gaussian distributions and updates them across tasks. An adaptive ensemble then integrates outputs from multiple subspaces to exploit domain-specific expertise and mitigate inter-subspace interference, achieving state-of-the-art results on CIFAR-100, ImageNet-R, CUB-200, and Cars-196. The approach offers practical benefits for rapid knowledge integration in PTMs, with potential extensions to federated continual learning, albeit with storage considerations for distribution parameters across many subspaces.

Abstract

Pre-trained model-based continual learning (PTMCL) has garnered growing attention, as it enables more rapid acquisition of new knowledge by leveraging the extensive foundational understanding inherent in pre-trained model (PTM). Most existing PTMCL methods use Parameter-Efficient Fine-Tuning (PEFT) to learn new knowledge while consolidating existing memory. However, they often face some challenges. A major challenge lies in the misalignment of classification heads, as the classification head of each task is trained within a distinct feature space, leading to inconsistent decision boundaries across tasks and, consequently, increased forgetting. Another critical limitation stems from the restricted feature-level knowledge accumulation, with feature learning typically restricted to the initial task only, which constrains the model's representation capabilities. To address these issues, we propose a method named DUal-level Knowledge Accumulation and Ensemble (DUKAE) that leverages both feature-level and decision-level knowledge accumulation by aligning classification heads into a unified feature space through Gaussian distribution sampling and introducing an adaptive expertise ensemble to fuse knowledge across feature subspaces. Extensive experiments on CIFAR-100, ImageNet-R, CUB-200, and Cars-196 datasets demonstrate the superior performance of our approach.

Paper Structure

This paper contains 17 sections, 8 equations, 6 figures, 4 tables.

Figures (6)

  • Figure 1: Different PTMCL methods compared with our method. (a) PTMCL methods with misalignment issue. Classification heads are learned in different feature subspaces which are defined by task-specific PEFT modules and then kept fixed. (b) PTMCL methods merely fine-tune feature network with initial task data. Classification heads are learned in same feature space but lack the accumulation of feature knowledge for future tasks. (c) Our method leverages both feature-level and decision-level knowledge accumulation. Each task-specific feature subspace has a corresponding aligned classifier and memory is consolidated through the ensemble of subspace classification results.
  • Figure 2: Comparison between our naive base method and LAE method (which suffers from misalignment problem) across four datasets under a 10 tasks continual learning setting. The y-axis represents average accuracy after learning the last task. The comparison is performed under identical network architecture and parameters configurations, illustrating that addressing the misalignment issue effectively mitigates forgetting.
  • Figure 3: Discriminative capability of different feature networks across four datasets. We split each dataset into 10 tasks and train classifiers for all tasks using fixed feature network $f(\cdot; \theta, \mathcal{P}_t)$ learned with dataset of different tasks $t$. The $10\times10$ heatmap represents task-wise classification accuracy, while the $1\times10$ heatmap indicates the average accuracy across tasks for each feature network. The results reveal distinct task-specific discriminative patterns and significant variations in overall feature discrimination capabilities for each feature network.
  • Figure 4: Illustration of our method, exemplified through the learning process on Task 5. (a) Feature-level knowledge accumulation process. The auxiliary classification head $\bar{\varphi}$ and the SSL branch are employed to facilitate the learning of the PEFT module $\mathcal{P}_5$, which is then cached for cumulative learning. (b) Decision-level knowledge accumulation and ensemble process. Using current task data, feature representations of $D_5$ are extracted in each feature subspace. These are then combined with Gaussian distributions $G_{i,j}$ from prior tasks $j$ to train classifiers $\varphi_i$ specific to each feature subspace $i$. Finally, adaptive expertise ensemble is applied to the ensemble of final output.
  • Figure 5: Performance comparison of ensemble with varying number of subspaces. Each figure shows LAA results from the first task to the tenth task using varying number of subspaces for ensemble across four datasets. Each line in the figure represents the performance of the ensemble with different number of subspaces after learning the corresponding task. For the first task, a maximum of one subspace can be utilized for ensemble, and this increases incrementally such that the tenth task allows for the use of up to ten subspaces for ensemble.
  • ...and 1 more figures