Table of Contents
Fetching ...

LoRA Subtraction for Drift-Resistant Space in Exemplar-Free Continual Learning

Xuan Liu, Xiaobin Chang

TL;DR

This work tackles catastrophic forgetting in exemplar-free continual learning caused by feature drift. It introduces Drift-Resistant Space (DRS) and a parameter-efficient method called LoRA Subtraction (LoRA$^-$) to form a drift-stable initialization by subtracting prior-task LoRA influence from the pre-trained weights, i.e., \tilde{\boldsymbol{W}_{t}}^{l} = \boldsymbol{W}_0^{l} - \sum_{j=1}^{t-1} \boldsymbol{B}_j^{l} \boldsymbol{A}_j^{l}. During training, gradients are projected into DRS using a basis \boldsymbol{P}_t^l derived from the SVD of the input covariance \tilde{\boldsymbol{\mathcal{X}}_{t}^{l}} with updates \Delta \boldsymbol{w}_{t,s}^l = \boldsymbol{P}_t^l (\boldsymbol{P}_t^l)^{\top} \boldsymbol{g}_{t,s}^l, and learning is further enhanced by Augmented Triplet Loss (ATL) L_TL = \max(0, e_{ap} - e_{an} + \epsilon) combined as L_total = L_CE + \lambda L_TL. Across long EFCL task sequences on ImageNet-R and CIFAR-100, the method achieves state-of-the-art accuracy and strong stability without storing old data, offering a practical approach for drift-resistant continual learning. The technique also attains favorable backward transfer and reduced memory usage, illustrating its potential for privacy-constrained and resource-limited settings.

Abstract

In continual learning (CL), catastrophic forgetting often arises due to feature drift. This challenge is particularly prominent in the exemplar-free continual learning (EFCL) setting, where samples from previous tasks cannot be retained, making it difficult to preserve prior knowledge. To address this issue, some EFCL methods aim to identify feature spaces that minimize the impact on previous tasks while accommodating new ones. However, they rely on static features or outdated statistics stored from old tasks, which prevents them from capturing the dynamic evolution of the feature space in CL, leading to performance degradation over time. In this paper, we introduce the Drift-Resistant Space (DRS), which effectively handles feature drifts without requiring explicit feature modeling or the storage of previous tasks. A novel parameter-efficient fine-tuning approach called Low-Rank Adaptation Subtraction (LoRA-) is proposed to develop the DRS. This method subtracts the LoRA weights of old tasks from the initial pre-trained weight before processing new task data to establish the DRS for model training. Therefore, LoRA- enhances stability, improves efficiency, and simplifies implementation. Furthermore, stabilizing feature drifts allows for better plasticity by learning with a triplet loss. Our method consistently achieves state-of-the-art results, especially for long task sequences, across multiple datasets.

LoRA Subtraction for Drift-Resistant Space in Exemplar-Free Continual Learning

TL;DR

This work tackles catastrophic forgetting in exemplar-free continual learning caused by feature drift. It introduces Drift-Resistant Space (DRS) and a parameter-efficient method called LoRA Subtraction (LoRA) to form a drift-stable initialization by subtracting prior-task LoRA influence from the pre-trained weights, i.e., \tilde{\boldsymbol{W}_{t}}^{l} = \boldsymbol{W}_0^{l} - \sum_{j=1}^{t-1} \boldsymbol{B}_j^{l} \boldsymbol{A}_j^{l}. During training, gradients are projected into DRS using a basis \boldsymbol{P}_t^l derived from the SVD of the input covariance \tilde{\boldsymbol{\mathcal{X}}_{t}^{l}} with updates \Delta \boldsymbol{w}_{t,s}^l = \boldsymbol{P}_t^l (\boldsymbol{P}_t^l)^{\top} \boldsymbol{g}_{t,s}^l, and learning is further enhanced by Augmented Triplet Loss (ATL) L_TL = \max(0, e_{ap} - e_{an} + \epsilon) combined as L_total = L_CE + \lambda L_TL. Across long EFCL task sequences on ImageNet-R and CIFAR-100, the method achieves state-of-the-art accuracy and strong stability without storing old data, offering a practical approach for drift-resistant continual learning. The technique also attains favorable backward transfer and reduced memory usage, illustrating its potential for privacy-constrained and resource-limited settings.

Abstract

In continual learning (CL), catastrophic forgetting often arises due to feature drift. This challenge is particularly prominent in the exemplar-free continual learning (EFCL) setting, where samples from previous tasks cannot be retained, making it difficult to preserve prior knowledge. To address this issue, some EFCL methods aim to identify feature spaces that minimize the impact on previous tasks while accommodating new ones. However, they rely on static features or outdated statistics stored from old tasks, which prevents them from capturing the dynamic evolution of the feature space in CL, leading to performance degradation over time. In this paper, we introduce the Drift-Resistant Space (DRS), which effectively handles feature drifts without requiring explicit feature modeling or the storage of previous tasks. A novel parameter-efficient fine-tuning approach called Low-Rank Adaptation Subtraction (LoRA-) is proposed to develop the DRS. This method subtracts the LoRA weights of old tasks from the initial pre-trained weight before processing new task data to establish the DRS for model training. Therefore, LoRA- enhances stability, improves efficiency, and simplifies implementation. Furthermore, stabilizing feature drifts allows for better plasticity by learning with a triplet loss. Our method consistently achieves state-of-the-art results, especially for long task sequences, across multiple datasets.

Paper Structure

This paper contains 19 sections, 18 equations, 3 figures, 11 tables, 1 algorithm.

Figures (3)

  • Figure 1: Illustrating the old task feature drifts of different EFCL methods. Imagenet-R dataset under 25 incremental tasks are used.
  • Figure 2: The training pipeline of the proposed LoRA Subtraction for Drift-Resistant Space. Before training on the $t$-th task, LoRA subtraction is applied to construct the drift-resistant space (DRS). During training, pre-trained weights and previously learned LoRAs are frozen. $A_t$ and $B_t$ of the current task is learned by projecting gradients into DRS, with augmented triplet loss (ATL) enhancing plasticity.
  • Figure 3: Performance comparison of different space designs on Imagenet-R across 50 incremental tasks.