Table of Contents
Fetching ...

Gated Integration of Low-Rank Adaptation for Continual Learning of Large Language Models

Yan-Shuo Liang, Jia-Rui Chen, Wu-Jun Li

TL;DR

GainLoRA tackles catastrophic forgetting in continual learning for large language models by expanding a new LoRA branch per task and introducing per-task gating modules to gate the integration of new and old branches. The method imposes orthogonal initialization and updating constraints on the gating modules to minimize the influence of the new branch on previously learned tasks, without requiring old-task data. Empirical results across diverse task sequences and model sizes show that GainLoRA consistently outperforms state-of-the-art LoRA-based CL methods with only modest overhead, and ablations confirm the necessity of the proposed constraints. This approach enables effective, task-sequence learning for LLMs in inference scenarios where task identifiers are unavailable, with practical implications for scalable continual adaptation of large models.

Abstract

Continual learning (CL), which requires the model to learn multiple tasks sequentially, is crucial for large language models (LLMs). Recently, low-rank adaptation~(LoRA), one of the most representative parameter-efficient fine-tuning (PEFT) methods, has gained increasing attention in CL of LLMs. However, most existing CL methods based on LoRA typically expand a new LoRA branch to learn each new task and force the new and old LoRA branches to influence old tasks equally, potentially leading to forgetting. In this work, we propose a new method, called gated integration of low-rank adaptation (GainLoRA), for CL of LLMs. GainLoRA expands a new LoRA branch for each new task and introduces gating modules to integrate the new and old LoRA branches. Furthermore, GainLoRA leverages the new gating module to minimize the influence from the new LoRA branch to old tasks, effectively mitigating forgetting and improving the model's overall performance. Experimental results on CL benchmarks demonstrate that GainLoRA outperforms existing state-of-the-art methods.

Gated Integration of Low-Rank Adaptation for Continual Learning of Large Language Models

TL;DR

GainLoRA tackles catastrophic forgetting in continual learning for large language models by expanding a new LoRA branch per task and introducing per-task gating modules to gate the integration of new and old branches. The method imposes orthogonal initialization and updating constraints on the gating modules to minimize the influence of the new branch on previously learned tasks, without requiring old-task data. Empirical results across diverse task sequences and model sizes show that GainLoRA consistently outperforms state-of-the-art LoRA-based CL methods with only modest overhead, and ablations confirm the necessity of the proposed constraints. This approach enables effective, task-sequence learning for LLMs in inference scenarios where task identifiers are unavailable, with practical implications for scalable continual adaptation of large models.

Abstract

Continual learning (CL), which requires the model to learn multiple tasks sequentially, is crucial for large language models (LLMs). Recently, low-rank adaptation~(LoRA), one of the most representative parameter-efficient fine-tuning (PEFT) methods, has gained increasing attention in CL of LLMs. However, most existing CL methods based on LoRA typically expand a new LoRA branch to learn each new task and force the new and old LoRA branches to influence old tasks equally, potentially leading to forgetting. In this work, we propose a new method, called gated integration of low-rank adaptation (GainLoRA), for CL of LLMs. GainLoRA expands a new LoRA branch for each new task and introduces gating modules to integrate the new and old LoRA branches. Furthermore, GainLoRA leverages the new gating module to minimize the influence from the new LoRA branch to old tasks, effectively mitigating forgetting and improving the model's overall performance. Experimental results on CL benchmarks demonstrate that GainLoRA outperforms existing state-of-the-art methods.

Paper Structure

This paper contains 44 sections, 2 theorems, 24 equations, 6 figures, 14 tables, 1 algorithm.

Key Result

Proposition 3.1

If the constraints in (eq:althernitive-objective) are satisfied, subspaces $\{\mathcal{M}_{t,l}\}_{l=1}^{L+1}$ remain unchanged during the learning of the $t$-th task. Furthermore, for any input $\bm{x}$ from the previous $t-1$ tasks, $g_{t}(\bm{x})$ remains unchanged during the learning of the $t$-

Figures (6)

  • Figure 1: (a) shows the expandable LoRA architecture of our GainLoRA for learning the $t$-th new task. (b) shows that for each task $\mathcal{T}_{i}$, GainLoRA uses an independent gating module $g_{i}(\cdot)$ to generate integration coefficient $a_{i}$.
  • Figure 2: The variation of performance across different CL methods during training on different task sequences.
  • Figure 3: (a), (b) and (c) show the number of trainable parameters for different CL methods and model backbones under task sequences Order 1 and Order 2.
  • Figure 4: (a) and (b) show outputs of new gating module in our GainLoRA on different task sequences with T5-Large. (c) and (d) show outputs of new gating module in our GainLoRA on different task sequences with Llama-2-7B.
  • Figure 5: (a) and (b) show the variation of our methods' performance with the shapes of the weights in the gating module. (c) and (d) show the variation of our methods' performance with the Layers of the gating module.
  • ...and 1 more figures

Theorems & Definitions (3)

  • Proposition 3.1
  • Proposition A.1
  • proof