C-LoRA: Continual Low-Rank Adaptation for Pre-trained Models
Xin Zhang, Liang Bai, Xian Yang, Jiye Liang
TL;DR
The paper tackles catastrophic forgetting and adapter growth in continual learning for pre-trained models. It introduces Continual Low-Rank Adaptation (C-LoRA), a single shared low-rank update modulated by a learnable routing matrix $\boldsymbol{\mathcal{R}}$, decomposed into $\boldsymbol{\mathcal{R}}_{\text{old}}$ and $\boldsymbol{\mathcal{R}}_{\delta}$ with orthogonality constraints to minimize interference. Theoretical support is provided via Theorem 3.1, showing reduced gradient changes for the shared subspace when training with the decomposition, and extensive experiments on four datasets with ViT backbones demonstrate state-of-the-art Last-Acc and Inc-Acc across 5, 10, and 20 incremental sessions, including ablations that validate the importance of the orthogonal regularization. The approach achieves scalable, parameter-efficient continual learning by reusing subspaces within a single adapter, offering practical impact for dynamic, multi-task environments where preserving prior knowledge is critical.
Abstract
Low-Rank Adaptation (LoRA) is an efficient fine-tuning method that has been extensively applied in areas such as natural language processing and computer vision. Existing LoRA fine-tuning approaches excel in static environments but struggle in dynamic learning due to reliance on multiple adapter modules, increasing overhead and complicating inference. We propose Continual Low-Rank Adaptation (C-LoRA), a novel extension of LoRA for continual learning. C-LoRA uses a learnable routing matrix to dynamically manage parameter updates across tasks, ensuring efficient reuse of learned subspaces while enforcing orthogonality to minimize interference and forgetting. Unlike existing approaches that require separate adapters for each task, C-LoRA enables a integrated approach for task adaptation, achieving both scalability and parameter efficiency in sequential learning scenarios. C-LoRA achieves state-of-the-art accuracy and parameter efficiency on benchmarks while providing theoretical insights into its routing matrix's role in retaining and transferring knowledge, establishing a scalable framework for continual learning.
