Table of Contents
Fetching ...

Task-Driven Subspace Decomposition for Knowledge Sharing and Isolation in LoRA-based Continual Learning

Lingfeng He, De Cheng, Huaijie Wang, Xi Yang, Nannan Wang, Xinbo Gao

TL;DR

Low-rank Decomposition and Adaptation (LoDA) performs a task-driven decomposition to build general and truly task-specific LoRA subspaces by solving two energy-based objectives, decoupling directions for knowledge sharing and isolation.

Abstract

Continual Learning (CL) requires models to sequentially adapt to new tasks without forgetting old knowledge. Recently, Low-Rank Adaptation (LoRA), a representative Parameter-Efficient Fine-Tuning (PEFT) method, has gained increasing attention in CL. Several LoRA-based CL methods reduce interference across tasks by separating their update spaces, typically building the new space from the estimated null space of past tasks. However, they (i) overlook task-shared directions, which suppresses knowledge transfer, and (ii) fail to capture truly effective task-specific directions since these ``null bases" of old tasks can remain nearly inactive for new task under correlated tasks. To address this, we study LoRA learning capability from a projection energy perspective, and propose Low-rank Decomposition and Adaptation (LoDA). It performs a task-driven decomposition to build general and truly task-specific LoRA subspaces by solving two energy-based objectives, decoupling directions for knowledge sharing and isolation. LoDA fixes LoRA down-projections on two subspaces and learns robust up-projections via a Gradient-Aligned Optimization (GAO) approach. After each task, before integrating the LoRA updates into the backbone, LoDA derives a closed-form recalibration for the general update, approximating a feature-level joint optimum along this task-shared direction. Experiments indicate that LoDA outperforms existing CL methods.

Task-Driven Subspace Decomposition for Knowledge Sharing and Isolation in LoRA-based Continual Learning

TL;DR

Low-rank Decomposition and Adaptation (LoDA) performs a task-driven decomposition to build general and truly task-specific LoRA subspaces by solving two energy-based objectives, decoupling directions for knowledge sharing and isolation.

Abstract

Continual Learning (CL) requires models to sequentially adapt to new tasks without forgetting old knowledge. Recently, Low-Rank Adaptation (LoRA), a representative Parameter-Efficient Fine-Tuning (PEFT) method, has gained increasing attention in CL. Several LoRA-based CL methods reduce interference across tasks by separating their update spaces, typically building the new space from the estimated null space of past tasks. However, they (i) overlook task-shared directions, which suppresses knowledge transfer, and (ii) fail to capture truly effective task-specific directions since these ``null bases" of old tasks can remain nearly inactive for new task under correlated tasks. To address this, we study LoRA learning capability from a projection energy perspective, and propose Low-rank Decomposition and Adaptation (LoDA). It performs a task-driven decomposition to build general and truly task-specific LoRA subspaces by solving two energy-based objectives, decoupling directions for knowledge sharing and isolation. LoDA fixes LoRA down-projections on two subspaces and learns robust up-projections via a Gradient-Aligned Optimization (GAO) approach. After each task, before integrating the LoRA updates into the backbone, LoDA derives a closed-form recalibration for the general update, approximating a feature-level joint optimum along this task-shared direction. Experiments indicate that LoDA outperforms existing CL methods.
Paper Structure (11 sections, 3 theorems, 16 equations, 5 figures, 6 tables)

This paper contains 11 sections, 3 theorems, 16 equations, 5 figures, 6 tables.

Key Result

Theorem 2.1

Let $\mathcal{L}(\mathbf{Y})$ be a differentiable loss for $\mathbf{Y}$. Fix $\mathbf A$ and update only $\mathbf B$ by one gradient descent step $\mathbf B'=\mathbf B-\eta\,\frac{\partial \mathcal{L}}{\partial \mathbf B}$. Then the first-order update $\Delta \mathbf{Y}$ of $\mathbf{Y}$ and the loss If the rows of $\mathbf A$ are orthonormal ($\mathbf A\mathbf A^\top=\mathbf I_r$ and $\mathbf I_r$

Figures (5)

  • Figure 1: Overall framework. At Task-$t$, we freeze backbone weight $\mathbf W^{t-1}$ and insert a dual-branch LoRA module. (A) We decompose the update space into general & isolated subspaces for LoRA down-projections. (B) Then we freeze down-projections and train up-projections on $\mathcal{D}^t$ via GAO. (C) After training, the LoRA updates are integrated into backbone with a recalibration on the general branch.
  • Figure 2: Averaged $r^t(\mathbf{U}_{\text{null}})$ in different layers on ImageNetA.
  • Figure 3: Loss values along the general direction (ImageNetA).
  • Figure 4: Relative accuracy curve of old tasks, new tasks and the overall performances on 10S-ImageNetR.
  • Figure 5: Projection magnitude and relative energy across different transformer layers on 10S-ImageNetA.

Theorems & Definitions (4)

  • Theorem 2.1
  • Theorem 2.2: Computation of $\mathbf{U}_I$
  • proof
  • Theorem 2.3: Closed-form solution for Eq.\ref{['eq:merge_obj']}