LoRA Subtraction for Drift-Resistant Space in Exemplar-Free Continual Learning
Xuan Liu, Xiaobin Chang
TL;DR
This work tackles catastrophic forgetting in exemplar-free continual learning caused by feature drift. It introduces Drift-Resistant Space (DRS) and a parameter-efficient method called LoRA Subtraction (LoRA$^-$) to form a drift-stable initialization by subtracting prior-task LoRA influence from the pre-trained weights, i.e., \tilde{\boldsymbol{W}_{t}}^{l} = \boldsymbol{W}_0^{l} - \sum_{j=1}^{t-1} \boldsymbol{B}_j^{l} \boldsymbol{A}_j^{l}. During training, gradients are projected into DRS using a basis \boldsymbol{P}_t^l derived from the SVD of the input covariance \tilde{\boldsymbol{\mathcal{X}}_{t}^{l}} with updates \Delta \boldsymbol{w}_{t,s}^l = \boldsymbol{P}_t^l (\boldsymbol{P}_t^l)^{\top} \boldsymbol{g}_{t,s}^l, and learning is further enhanced by Augmented Triplet Loss (ATL) L_TL = \max(0, e_{ap} - e_{an} + \epsilon) combined as L_total = L_CE + \lambda L_TL. Across long EFCL task sequences on ImageNet-R and CIFAR-100, the method achieves state-of-the-art accuracy and strong stability without storing old data, offering a practical approach for drift-resistant continual learning. The technique also attains favorable backward transfer and reduced memory usage, illustrating its potential for privacy-constrained and resource-limited settings.
Abstract
In continual learning (CL), catastrophic forgetting often arises due to feature drift. This challenge is particularly prominent in the exemplar-free continual learning (EFCL) setting, where samples from previous tasks cannot be retained, making it difficult to preserve prior knowledge. To address this issue, some EFCL methods aim to identify feature spaces that minimize the impact on previous tasks while accommodating new ones. However, they rely on static features or outdated statistics stored from old tasks, which prevents them from capturing the dynamic evolution of the feature space in CL, leading to performance degradation over time. In this paper, we introduce the Drift-Resistant Space (DRS), which effectively handles feature drifts without requiring explicit feature modeling or the storage of previous tasks. A novel parameter-efficient fine-tuning approach called Low-Rank Adaptation Subtraction (LoRA-) is proposed to develop the DRS. This method subtracts the LoRA weights of old tasks from the initial pre-trained weight before processing new task data to establish the DRS for model training. Therefore, LoRA- enhances stability, improves efficiency, and simplifies implementation. Furthermore, stabilizing feature drifts allows for better plasticity by learning with a triplet loss. Our method consistently achieves state-of-the-art results, especially for long task sequences, across multiple datasets.
