Table of Contents
Fetching ...

C-LoRA: Continual Low-Rank Adaptation for Pre-trained Models

Xin Zhang, Liang Bai, Xian Yang, Jiye Liang

TL;DR

The paper tackles catastrophic forgetting and adapter growth in continual learning for pre-trained models. It introduces Continual Low-Rank Adaptation (C-LoRA), a single shared low-rank update modulated by a learnable routing matrix $\boldsymbol{\mathcal{R}}$, decomposed into $\boldsymbol{\mathcal{R}}_{\text{old}}$ and $\boldsymbol{\mathcal{R}}_{\delta}$ with orthogonality constraints to minimize interference. Theoretical support is provided via Theorem 3.1, showing reduced gradient changes for the shared subspace when training with the decomposition, and extensive experiments on four datasets with ViT backbones demonstrate state-of-the-art Last-Acc and Inc-Acc across 5, 10, and 20 incremental sessions, including ablations that validate the importance of the orthogonal regularization. The approach achieves scalable, parameter-efficient continual learning by reusing subspaces within a single adapter, offering practical impact for dynamic, multi-task environments where preserving prior knowledge is critical.

Abstract

Low-Rank Adaptation (LoRA) is an efficient fine-tuning method that has been extensively applied in areas such as natural language processing and computer vision. Existing LoRA fine-tuning approaches excel in static environments but struggle in dynamic learning due to reliance on multiple adapter modules, increasing overhead and complicating inference. We propose Continual Low-Rank Adaptation (C-LoRA), a novel extension of LoRA for continual learning. C-LoRA uses a learnable routing matrix to dynamically manage parameter updates across tasks, ensuring efficient reuse of learned subspaces while enforcing orthogonality to minimize interference and forgetting. Unlike existing approaches that require separate adapters for each task, C-LoRA enables a integrated approach for task adaptation, achieving both scalability and parameter efficiency in sequential learning scenarios. C-LoRA achieves state-of-the-art accuracy and parameter efficiency on benchmarks while providing theoretical insights into its routing matrix's role in retaining and transferring knowledge, establishing a scalable framework for continual learning.

C-LoRA: Continual Low-Rank Adaptation for Pre-trained Models

TL;DR

The paper tackles catastrophic forgetting and adapter growth in continual learning for pre-trained models. It introduces Continual Low-Rank Adaptation (C-LoRA), a single shared low-rank update modulated by a learnable routing matrix , decomposed into and with orthogonality constraints to minimize interference. Theoretical support is provided via Theorem 3.1, showing reduced gradient changes for the shared subspace when training with the decomposition, and extensive experiments on four datasets with ViT backbones demonstrate state-of-the-art Last-Acc and Inc-Acc across 5, 10, and 20 incremental sessions, including ablations that validate the importance of the orthogonal regularization. The approach achieves scalable, parameter-efficient continual learning by reusing subspaces within a single adapter, offering practical impact for dynamic, multi-task environments where preserving prior knowledge is critical.

Abstract

Low-Rank Adaptation (LoRA) is an efficient fine-tuning method that has been extensively applied in areas such as natural language processing and computer vision. Existing LoRA fine-tuning approaches excel in static environments but struggle in dynamic learning due to reliance on multiple adapter modules, increasing overhead and complicating inference. We propose Continual Low-Rank Adaptation (C-LoRA), a novel extension of LoRA for continual learning. C-LoRA uses a learnable routing matrix to dynamically manage parameter updates across tasks, ensuring efficient reuse of learned subspaces while enforcing orthogonality to minimize interference and forgetting. Unlike existing approaches that require separate adapters for each task, C-LoRA enables a integrated approach for task adaptation, achieving both scalability and parameter efficiency in sequential learning scenarios. C-LoRA achieves state-of-the-art accuracy and parameter efficiency on benchmarks while providing theoretical insights into its routing matrix's role in retaining and transferring knowledge, establishing a scalable framework for continual learning.

Paper Structure

This paper contains 12 sections, 26 equations, 6 figures, 4 tables.

Figures (6)

  • Figure 1: Comparison of MoE and proposed C-LoRA architecture. We extend the equivalent form of MoE and decompose the routing matrix $\boldsymbol{\mathcal{R}}$ into two parts: $\boldsymbol{\mathcal{R}}_\text{old}$ and $\boldsymbol{\mathcal{R}}_{\delta }$, where $\boldsymbol{\mathcal{R}}_\text{old}$ does not participate in gradient computation during training.
  • Figure 2: Illustration of the Proposed Model. Left: Vision Transformer (ViT) integrated with the C-LoRA module, where the adapter and local classifier are incrementally trained in each session . Right: Proposed architecture mitigates catastrophic forgetting by decoupling $\boldsymbol{\mathcal{R}}$ to constrain updates within the parameter space of previous tasks and enforcing orthogonality between $\boldsymbol{\mathcal{R}}$ updates and the low-rank parameter space of past tasks.
  • Figure 3: Accuracy performance in each of the 5 incremental sessions on CIFAR-100, ImageNet-A, CUB-200, and CAR196.
  • Figure 4: Accuracy performance in each of the 10 incremental sessions on CIFAR-100, ImageNet-A, CUB-200, and CAR196.
  • Figure 5: Accuracy performance in each of the 20 incremental sessions on CIFAR-100, ImageNet-A, CUB-200, and CAR196.
  • ...and 1 more figures