Table of Contents
Fetching ...

InfLoRA: Interference-Free Low-Rank Adaptation for Continual Learning

Yan-Shuo Liang, Wu-Jun Li

TL;DR

InfLoRA tackles interference between sequential tasks in continual learning by designing a pre-trained-weight subspace that is orthogonal to old-task gradients and aligned with the new-task gradients. It reparameterizes the backbone with a single low-rank branch per task, such that updating the injected parameters $\bm{A}_t$ equates to subspace-fine-tuning of $\bm{W}$ within $\text{span}\{\bm{b}_1^t,...,\bm{b}_r^t\}$; the subspace is constructed via $\hat{\bm{H}}_t$ from inputs and gradient-memory projections from DualGPM, ensuring minimal interference with prior tasks. A two-stage design—orthogonality to old gradients and alignment with the new-task gradient—plus a principled selection of $\bm{B}_t$ from SVD on projected inputs—yields a robust, interference-free continual learner. Empirically, InfLoRA and its 5-block variant substantially outperform state-of-the-art PEFT-based continual-learning methods on ImageNet-R, CIFAR100, and DomainNet while maintaining parameter efficiency and fast inference. The approach promises practical gains for scalable continual learning with large pre-trained transformers in vision applications.

Abstract

Continual learning requires the model to learn multiple tasks sequentially. In continual learning, the model should possess the ability to maintain its performance on old tasks (stability) and the ability to adapt to new tasks continuously (plasticity). Recently, parameter-efficient fine-tuning (PEFT), which involves freezing a pre-trained model and injecting a small number of learnable parameters to adapt to downstream tasks, has gained increasing popularity in continual learning. Although existing continual learning methods based on PEFT have demonstrated superior performance compared to those not based on PEFT, most of them do not consider how to eliminate the interference of the new task on the old tasks, which inhibits the model from making a good trade-off between stability and plasticity. In this work, we propose a new PEFT method, called interference-free low-rank adaptation (InfLoRA), for continual learning. InfLoRA injects a small number of parameters to reparameterize the pre-trained weights and shows that fine-tuning these injected parameters is equivalent to fine-tuning the pre-trained weights within a subspace. Furthermore, InfLoRA designs this subspace to eliminate the interference of the new task on the old tasks, making a good trade-off between stability and plasticity. Experimental results show that InfLoRA outperforms existing state-of-the-art continual learning methods on multiple datasets.

InfLoRA: Interference-Free Low-Rank Adaptation for Continual Learning

TL;DR

InfLoRA tackles interference between sequential tasks in continual learning by designing a pre-trained-weight subspace that is orthogonal to old-task gradients and aligned with the new-task gradients. It reparameterizes the backbone with a single low-rank branch per task, such that updating the injected parameters equates to subspace-fine-tuning of within ; the subspace is constructed via from inputs and gradient-memory projections from DualGPM, ensuring minimal interference with prior tasks. A two-stage design—orthogonality to old gradients and alignment with the new-task gradient—plus a principled selection of from SVD on projected inputs—yields a robust, interference-free continual learner. Empirically, InfLoRA and its 5-block variant substantially outperform state-of-the-art PEFT-based continual-learning methods on ImageNet-R, CIFAR100, and DomainNet while maintaining parameter efficiency and fast inference. The approach promises practical gains for scalable continual learning with large pre-trained transformers in vision applications.

Abstract

Continual learning requires the model to learn multiple tasks sequentially. In continual learning, the model should possess the ability to maintain its performance on old tasks (stability) and the ability to adapt to new tasks continuously (plasticity). Recently, parameter-efficient fine-tuning (PEFT), which involves freezing a pre-trained model and injecting a small number of learnable parameters to adapt to downstream tasks, has gained increasing popularity in continual learning. Although existing continual learning methods based on PEFT have demonstrated superior performance compared to those not based on PEFT, most of them do not consider how to eliminate the interference of the new task on the old tasks, which inhibits the model from making a good trade-off between stability and plasticity. In this work, we propose a new PEFT method, called interference-free low-rank adaptation (InfLoRA), for continual learning. InfLoRA injects a small number of parameters to reparameterize the pre-trained weights and shows that fine-tuning these injected parameters is equivalent to fine-tuning the pre-trained weights within a subspace. Furthermore, InfLoRA designs this subspace to eliminate the interference of the new task on the old tasks, making a good trade-off between stability and plasticity. Experimental results show that InfLoRA outperforms existing state-of-the-art continual learning methods on multiple datasets.
Paper Structure (29 sections, 1 theorem, 17 equations, 7 figures, 10 tables, 1 algorithm)

This paper contains 29 sections, 1 theorem, 17 equations, 7 figures, 10 tables, 1 algorithm.

Key Result

Proposition 1

When learning the $t$-th task with forward propagation represented by (eq:taha), fine-tuning $\bm{A}_{t}$ is equivalent to fine-tuning the pre-trained weight $\bm{W}$ within the subspace ${\rm span}\{\bm{b}_{1}^{t},...,\bm{b}_{r}^{t}\}$. Here, $\bm{b}_{i}^{t}$ ($1\leq i\leq r$) denotes the $i$-th ro

Figures (7)

  • Figure 1: (a) The architecture of our InfLoRA in a certain linear layer of a Transformer. During the learning of the $t$-th task, the pre-trained weight and all the old branches are frozen, and only $\bm{A}_{t}$ is fine-tuned. (b) The pipeline of designing dimensionality reduction matrix $\bm{B}_{t}$.
  • Figure 2: Variation of the performance of different methods during the learning of ImageNet-R and CIFAR100.
  • Figure 3: Variation of the performance of different methods during the learning of ImageNet-R and CIFAR100.
  • Figure 4: Relative accuracy of different tasks. Relative accuracy is the accuracy of different variants minus the accuracy of InfLoRA.
  • Figure 5: Change of the dimension of subspace $\mathcal{M}_{t}^{\bot}$ throughout the whole learning process.
  • ...and 2 more figures

Theorems & Definitions (2)

  • Proposition 1
  • proof