Table of Contents
Fetching ...

LoRA-Based Continual Learning with Constraints on Critical Parameter Changes

Shimou Ling, Liang Zhang, Jiangwei Zhao, Lili Pan, Hongliang Li

TL;DR

The paper tackles catastrophic forgetting in continual learning with pre-trained vision transformers by showing that orthodox orthogonal LoRA tuning alone does not fully stabilize important pre-task parameters. It introduces LoRAC-IPC, a model that combines orthogonal LoRA composition (via QR-based factorization) with Important Parameter Constraints to freeze critical parameter matrices, plus task-adaptive prediction enhancements. The approach yields state-of-the-art results across multiple benchmarks (including Split CIFAR-100, ImageNet-R, DomainNet) and demonstrates strong multi-modal performance, with ablations confirming the contributions of LoRA composition, orthogonality, IPC, and task-ID inference. This work offers a scalable, parameter-efficient path to robust continual learning in decision-critical, real-world tasks while maintaining plasticity for new knowledge.

Abstract

LoRA-based continual learning represents a promising avenue for leveraging pre-trained models in downstream continual learning tasks. Recent studies have shown that orthogonal LoRA tuning effectively mitigates forgetting. However, this work unveils that under orthogonal LoRA tuning, the critical parameters for pre-tasks still change notably after learning post-tasks. To address this problem, we directly propose freezing the most critical parameter matrices in the Vision Transformer (ViT) for pre-tasks before learning post-tasks. In addition, building on orthogonal LoRA tuning, we propose orthogonal LoRA composition (LoRAC) based on QR decomposition, which may further enhance the plasticity of our method. Elaborate ablation studies and extensive comparisons demonstrate the effectiveness of our proposed method. Our results indicate that our method achieves state-of-the-art (SOTA) performance on several well-known continual learning benchmarks. For instance, on the Split CIFAR-100 dataset, our method shows a 6.35\% improvement in accuracy and a 3.24\% reduction in forgetting compared to previous methods. Our code is available at https://github.com/learninginvision/LoRAC-IPC.

LoRA-Based Continual Learning with Constraints on Critical Parameter Changes

TL;DR

The paper tackles catastrophic forgetting in continual learning with pre-trained vision transformers by showing that orthodox orthogonal LoRA tuning alone does not fully stabilize important pre-task parameters. It introduces LoRAC-IPC, a model that combines orthogonal LoRA composition (via QR-based factorization) with Important Parameter Constraints to freeze critical parameter matrices, plus task-adaptive prediction enhancements. The approach yields state-of-the-art results across multiple benchmarks (including Split CIFAR-100, ImageNet-R, DomainNet) and demonstrates strong multi-modal performance, with ablations confirming the contributions of LoRA composition, orthogonality, IPC, and task-ID inference. This work offers a scalable, parameter-efficient path to robust continual learning in decision-critical, real-world tasks while maintaining plasticity for new knowledge.

Abstract

LoRA-based continual learning represents a promising avenue for leveraging pre-trained models in downstream continual learning tasks. Recent studies have shown that orthogonal LoRA tuning effectively mitigates forgetting. However, this work unveils that under orthogonal LoRA tuning, the critical parameters for pre-tasks still change notably after learning post-tasks. To address this problem, we directly propose freezing the most critical parameter matrices in the Vision Transformer (ViT) for pre-tasks before learning post-tasks. In addition, building on orthogonal LoRA tuning, we propose orthogonal LoRA composition (LoRAC) based on QR decomposition, which may further enhance the plasticity of our method. Elaborate ablation studies and extensive comparisons demonstrate the effectiveness of our proposed method. Our results indicate that our method achieves state-of-the-art (SOTA) performance on several well-known continual learning benchmarks. For instance, on the Split CIFAR-100 dataset, our method shows a 6.35\% improvement in accuracy and a 3.24\% reduction in forgetting compared to previous methods. Our code is available at https://github.com/learninginvision/LoRAC-IPC.

Paper Structure

This paper contains 24 sections, 12 equations, 10 figures, 8 tables.

Figures (10)

  • Figure 1: The degree of variation in important parameters with orthogonal constraints. The calculation of parameter importance is based on the sensitivity of the parameter to training losses and is discussed in \ref{['sec:ipc']}. We use the L2-norm to measure the degree of variation in the parameters after completing task $t+1$ ($||\mathbf{W}_{t+1} - \mathbf{W}_{t}||_{2}$) and after completing all tasks ($||\mathbf{W}_{T} - \mathbf{W}_{t}||_{2}$), respectively. Important parameters for each task are highlighted by yellow boxes.
  • Figure 2: Continual Learning with Orthogonal LoRA Composition and Important Parameter Constraints. The upper illustrates the workflow of Important Parameter Constraints (IPC). Upon completion of training for the current task, parameter matrices important to the current task are constrained to remain unchanged in continual learning. The lower shows the framework for Orthogonal LoRA Composition, consisting of three components: LoRA composition, the QR decomposition of matrix $\mathbf{A}_{t}$ and the orthogonality regularization on the matrix $\tilde{\mathbf{Q}}_t$.
  • Figure 3: Parameter Adjustment for Task Adaptive Prediction. We use the feature extractor $f\left(\cdot,\mathbf{\Theta}_t\right)$, trained on task $t$, to extract the prototypes of each class in that task. Then, we perform Gaussian sampling from the class prototypes to obtain the pseudo features $\mathbf{f}'$ for adjusting the classifier $h(\cdot, \boldsymbol{\Phi})$. After completing Task Adaptive Prediction, the classifier can distinguish classes from different tasks.
  • Figure 4: Delta parameter absolute values of the model on each task. Based on the Sup-21k* pre-trained model learned sequentially on Split CIFAR-100 using LoRA-FT and LoRAC w/o TII, respectively, the variations of the model's parameter with LoRA on tasks 2, 3, 5, 7, and 10 are shown, along with the average accuracy.
  • Figure 5: The bar graphs on the left depict the variation in accuracy of LoRAC and LoRAC-IPC across various tasks on Split CIFAR-100. The right half shows important parameters for the current tasks of LoRAC and LoRAC-IPC. Here we select the parameter matrices in the top 10% of importance for the current task. Important parameters for each task are highlighted by yellow boxes.
  • ...and 5 more figures