Table of Contents
Fetching ...

FM-LoRA: Factorized Low-Rank Meta-Prompting for Continual Learning

Xiaobing Yu, Jin Yang, Xiao Wu, Peijie Qiu, Xiaofeng Liu

TL;DR

FM-LoRA addresses continual learning under sequential tasks by integrating Factorized Low-Rank Adaptation (F-LoRA), Dynamic Rank Selector (DRS), and Dynamic Meta-Prompting (DMP) to achieve rehearsal-free, parameter-efficient adaptation. F-LoRA confines updates to a shared low-rank subspace using global bases $A_{shared}, B_{shared}$ and task-specific matrices $M_t,N_t$, reducing per-task parameters to $2 r^2$ and limiting interference; DRS selects an effective rank $r_t$ per task based on a complexity measure $H(\\mathcal{T}_t)$ via a Gumbel-Softmax, dynamically matching capacity to task difficulty and similarity; DMP adds a learnable prompt matrix $P$ to stabilize representations across tasks. The combination yields a robust balance of stability and plasticity, with reported SOTA performance on ImageNet-R, CIFAR100, CUB200, and DomainNet under class- and domain-incremental settings, especially as task length grows. This approach reduces memory growth, avoids data rehearsal, and demonstrates strong generalization across diverse tasks and domains, making it practical for continual learning with large pre-trained transformers.

Abstract

How to adapt a pre-trained model continuously for sequential tasks with different prediction class labels and domains and finally learn a generalizable model across diverse tasks is a long-lasting challenge. Continual learning (CL) has emerged as a promising approach to leverage pre-trained models (e.g., Transformers) for sequential tasks. While many existing CL methods incrementally store additional learned structures, such as Low-Rank Adaptation (LoRA) adapters or prompts and sometimes even preserve features from previous samples to maintain performance. This leads to unsustainable parameter growth and escalating storage costs as the number of tasks increases. Moreover, current approaches often lack task similarity awareness, which further hinders the models ability to effectively adapt to new tasks without interfering with previously acquired knowledge. To address these challenges, we propose FM-LoRA, a novel and efficient low-rank adaptation method that integrates both a dynamic rank selector (DRS) and dynamic meta-prompting (DMP). This framework allocates model capacity more effectively across tasks by leveraging a shared low-rank subspace critical for preserving knowledge, thereby avoiding continual parameter expansion. Extensive experiments on various CL benchmarks, including ImageNet-R, CIFAR100, and CUB200 for class-incremental learning (CIL), and DomainNet for domain-incremental learning (DIL), with Transformers backbone demonstrate that FM-LoRA effectively mitigates catastrophic forgetting while delivering robust performance across a diverse range of tasks and domains.

FM-LoRA: Factorized Low-Rank Meta-Prompting for Continual Learning

TL;DR

FM-LoRA addresses continual learning under sequential tasks by integrating Factorized Low-Rank Adaptation (F-LoRA), Dynamic Rank Selector (DRS), and Dynamic Meta-Prompting (DMP) to achieve rehearsal-free, parameter-efficient adaptation. F-LoRA confines updates to a shared low-rank subspace using global bases and task-specific matrices , reducing per-task parameters to and limiting interference; DRS selects an effective rank per task based on a complexity measure via a Gumbel-Softmax, dynamically matching capacity to task difficulty and similarity; DMP adds a learnable prompt matrix to stabilize representations across tasks. The combination yields a robust balance of stability and plasticity, with reported SOTA performance on ImageNet-R, CIFAR100, CUB200, and DomainNet under class- and domain-incremental settings, especially as task length grows. This approach reduces memory growth, avoids data rehearsal, and demonstrates strong generalization across diverse tasks and domains, making it practical for continual learning with large pre-trained transformers.

Abstract

How to adapt a pre-trained model continuously for sequential tasks with different prediction class labels and domains and finally learn a generalizable model across diverse tasks is a long-lasting challenge. Continual learning (CL) has emerged as a promising approach to leverage pre-trained models (e.g., Transformers) for sequential tasks. While many existing CL methods incrementally store additional learned structures, such as Low-Rank Adaptation (LoRA) adapters or prompts and sometimes even preserve features from previous samples to maintain performance. This leads to unsustainable parameter growth and escalating storage costs as the number of tasks increases. Moreover, current approaches often lack task similarity awareness, which further hinders the models ability to effectively adapt to new tasks without interfering with previously acquired knowledge. To address these challenges, we propose FM-LoRA, a novel and efficient low-rank adaptation method that integrates both a dynamic rank selector (DRS) and dynamic meta-prompting (DMP). This framework allocates model capacity more effectively across tasks by leveraging a shared low-rank subspace critical for preserving knowledge, thereby avoiding continual parameter expansion. Extensive experiments on various CL benchmarks, including ImageNet-R, CIFAR100, and CUB200 for class-incremental learning (CIL), and DomainNet for domain-incremental learning (DIL), with Transformers backbone demonstrate that FM-LoRA effectively mitigates catastrophic forgetting while delivering robust performance across a diverse range of tasks and domains.

Paper Structure

This paper contains 14 sections, 13 equations, 4 figures, 3 tables.

Figures (4)

  • Figure 1: Illustration of incremental weight updates in standard LoRA and our proposed FM-LoRA within CL scenarios. (A). depicts vanilla low-rank adaptation with $r_{fixed} \ll \min\{d_{input},d_{output}\}$. (B). demonstrates FM-LoRA’s learning process across two sequential tasks where this illustrates incremental weight updates in FM-LoRA, showing that task 1 learns maximum-rank shared bases $A_{\text{shared}}$ and $B_{\text{shared}}$, while task 2 dynamically selects a lower rank $r_2$ via DRS, producing updates $\Delta W_t = A_{\text{shared}} M_t N_t^\top B_{\text{shared}}^\top$.
  • Figure 2: An illustration of the proposed dynamic meta prompting (DMP), where a learnable prompt matrix is prepended to each input sequence. This design stabilizes representations across incremental tasks by providing a shared context throughout training.
  • Figure 3: Performance trajectory comparisons (Acc andAAA, the lower the better) across varying numbers of sequential tasks ($N$=5, 10, 20) on ImageNet-R for different CL methods.
  • Figure 4: Detailed ablation study of the proposed DMP with varying initial numbers of shared tokens. (A-C) Results on ImageNet-R with $N=5, 10, 20$ across different $m$ values, demonstrating the importance of adapting DMP complexity based on the number of tasks.