Table of Contents
Fetching ...

DESIRE: Dynamic Knowledge Consolidation for Rehearsal-Free Continual Learning

Haiyang Guo, Fei Zhu, Fanhu Zeng, Bing Liu, Xu-Yao Zhang

TL;DR

DESIRE tackles rehearsal-free class-incremental learning by decoupling per-task training and applying two efficient post-processing modules: dynamic representation consolidation via continual merging of two LoRA parameter sets using a feature-space attribution loss, and decision boundary refinement through pseudo-feature rebalancing. The approach leverages Gaussian-statistical class representations (means $\boldsymbol{\mu}_i$ and covariances $\boldsymbol{\Sigma}_i$) to guide merging and calibrate classifiers without storing all past task parameters. Empirically, DESIRE achieves state-of-the-art performance among rehearsal-free methods and competitive results with rehearsal-based methods on CIFAR100, TinyImageNet, and ImageNet380 across 5/10/20-task settings, while maintaining efficiency. The work contributes a scalable, plug-in merging paradigm and distribution-based classifier calibration that improve stability-plasticity balance under continual learning with minimal data leakage.

Abstract

Continual learning aims to equip models with the ability to retain previously learned knowledge like a human. Recent work incorporating Parameter-Efficient Fine-Tuning has revitalized the field by introducing lightweight extension modules. However, existing methods usually overlook the issue of information leakage caused by the fact that the experiment data have been used in pre-trained models. Once these duplicate data are removed in the pre-training phase, their performance can be severely affected. In this paper, we propose a new LoRA-based rehearsal-free method named DESIRE. Our method avoids imposing additional constraints during training to mitigate catastrophic forgetting, thereby maximizing the learning of new classes. To integrate knowledge from old and new tasks, we propose two efficient post-processing modules. On the one hand, we retain only two sets of LoRA parameters for merging and propose dynamic representation consolidation to calibrate the merged feature representation. On the other hand, we propose decision boundary refinement to address classifier bias when training solely on new class data. Extensive experiments demonstrate that our method achieves state-of-the-art performance on multiple datasets and strikes an effective balance between stability and plasticity. Our code will be publicly available.

DESIRE: Dynamic Knowledge Consolidation for Rehearsal-Free Continual Learning

TL;DR

DESIRE tackles rehearsal-free class-incremental learning by decoupling per-task training and applying two efficient post-processing modules: dynamic representation consolidation via continual merging of two LoRA parameter sets using a feature-space attribution loss, and decision boundary refinement through pseudo-feature rebalancing. The approach leverages Gaussian-statistical class representations (means and covariances ) to guide merging and calibrate classifiers without storing all past task parameters. Empirically, DESIRE achieves state-of-the-art performance among rehearsal-free methods and competitive results with rehearsal-based methods on CIFAR100, TinyImageNet, and ImageNet380 across 5/10/20-task settings, while maintaining efficiency. The work contributes a scalable, plug-in merging paradigm and distribution-based classifier calibration that improve stability-plasticity balance under continual learning with minimal data leakage.

Abstract

Continual learning aims to equip models with the ability to retain previously learned knowledge like a human. Recent work incorporating Parameter-Efficient Fine-Tuning has revitalized the field by introducing lightweight extension modules. However, existing methods usually overlook the issue of information leakage caused by the fact that the experiment data have been used in pre-trained models. Once these duplicate data are removed in the pre-training phase, their performance can be severely affected. In this paper, we propose a new LoRA-based rehearsal-free method named DESIRE. Our method avoids imposing additional constraints during training to mitigate catastrophic forgetting, thereby maximizing the learning of new classes. To integrate knowledge from old and new tasks, we propose two efficient post-processing modules. On the one hand, we retain only two sets of LoRA parameters for merging and propose dynamic representation consolidation to calibrate the merged feature representation. On the other hand, we propose decision boundary refinement to address classifier bias when training solely on new class data. Extensive experiments demonstrate that our method achieves state-of-the-art performance on multiple datasets and strikes an effective balance between stability and plasticity. Our code will be publicly available.

Paper Structure

This paper contains 19 sections, 6 equations, 12 figures, 6 tables, 1 algorithm.

Figures (12)

  • Figure 1: Stability and plasticity analysis. We visualize the accuracy of the final task ($Acc_T$) and the average accuracy of the previous $T-1$ tasks ($\frac{1}{T-1}\sum_{t=1}^{T-1}Acc_t$) at the last stage ($T=10$) for different methods under three datasets. Methods that are closer to the diagonal and nearer to the upper-right corner of the graph are superior. More detailed results can be seen in Sec \ref{['sec:further_analysis']}.
  • Figure 2: (a) Joint training using the full data achieves optimal performance (Upper bound). (b) Fine-tuning old models using only new data can lead to catastrophic forgetting. (c) Regularization-based methods protect old tasks by imposing additional constraints when learning new tasks. (d) Our method integrates knowledge by merging parameters from previous and current tasks and proposes DESIRE to consolidate the feature representation and refine the classifier.
  • Figure 3: Illustration of the proposed DESIRE. Left: The backbone of the model is frozen during individual training and only the LoRA and classifier are trainable. Middle: We obtain the knowledge of the old and new tasks by merging the parameter space (LoRA). To better consolidate the representations, we sample tiny unlabeled test data to optimize the merging coefficients through our proposed attribution loss (Sec. \ref{['lora_fushion']}). Right: We reconstruct the pseudo-features using the counted statistical informations and use them to refine the decision boundaries of the classifier.
  • Figure 4: Results of accuracy curve on CIFAR100, TinyImageNet and ImageNet380 under 10T.
  • Figure 5: Results of accuracy curve on CIFAR100, TinyImageNet and ImageNet380 under 20T.
  • ...and 7 more figures