Table of Contents
Fetching ...

Adaptive Budget Allocation for Orthogonal-Subspace Adapter Tuning in LLMs Continual Learning

Zhiyi Wan, Wanrou Du, Liang Li, Miao Pan, Xiaoqi Qin

TL;DR

The paper tackles catastrophic forgetting in continual learning for large language models by addressing the inefficiencies of fixed-budget orthogonal subspace methods and the overhead of multi-stage budget-tuning. It introduces OA-Adapter, a parameter-efficient framework that jointly learns dynamic budget adaptation and orthogonal task subspaces in an end-to-end training process, using a trainable diagonal mask to modulate bottleneck dimensions. Orthogonality constraints ensure current task updates remain independent from past tasks, preserving knowledge while allowing task-specific optimization. Empirical results on multiple CL benchmarks show OA-Adapter delivers higher average accuracy with substantially fewer parameters than baselines, and analyses reveal heterogeneous budget needs across tasks and layers. The work establishes a new paradigm for budget-aware continual learning in LLMs and demonstrates compatibility with replay strategies, offering scalable improvements for real-world deployment.

Abstract

Large language models (LLMs) often suffer from catastrophic forgetting in continual learning (CL) scenarios, where performance on previously learned tasks degrades severely while training on sequentially arriving tasks. Although pioneering CL approaches using orthogonal subspaces can mitigate task interference, they typically employ fixed budget allocation, neglecting the varying complexity across tasks and layers. Besides, recent budget-adaptive tuning methods for LLMs often adopt multi-stage paradigms that decouple optimization and budget allocation. Such decoupling results in potential misalignment, which hinders those approaches' practical application in CL scenarios. To address these limitations, we propose OA-Adapter, a novel parameter-efficient approach for continual learning in LLMs that unifies dynamic budget adaptation with orthogonal subspace learning in an end-to-end training stage. Specifically, OA-Adapter introduces a dynamic bottleneck dimension adaptation mechanism that simultaneously allocates an efficient parameter budget and optimizes task objectives without misalignment.To effectively preserve previously acquired knowledge while coordinating with the dynamic budget allocation, orthogonal constraints are applied specifically between the parameter subspace of the current task and the dynamically allocated parameter subspaces of historical tasks. Experimental results on continual learning benchmarks demonstrate that OA-Adapter outperforms state-of-the-art methods in both accuracy and parameter efficiency. OA-Adapter achieves higher average accuracy while using 58.5% fewer parameters on the standard CL benchmark, and maintains its advantages on two larger benchmarks comprising 15 tasks.

Adaptive Budget Allocation for Orthogonal-Subspace Adapter Tuning in LLMs Continual Learning

TL;DR

The paper tackles catastrophic forgetting in continual learning for large language models by addressing the inefficiencies of fixed-budget orthogonal subspace methods and the overhead of multi-stage budget-tuning. It introduces OA-Adapter, a parameter-efficient framework that jointly learns dynamic budget adaptation and orthogonal task subspaces in an end-to-end training process, using a trainable diagonal mask to modulate bottleneck dimensions. Orthogonality constraints ensure current task updates remain independent from past tasks, preserving knowledge while allowing task-specific optimization. Empirical results on multiple CL benchmarks show OA-Adapter delivers higher average accuracy with substantially fewer parameters than baselines, and analyses reveal heterogeneous budget needs across tasks and layers. The work establishes a new paradigm for budget-aware continual learning in LLMs and demonstrates compatibility with replay strategies, offering scalable improvements for real-world deployment.

Abstract

Large language models (LLMs) often suffer from catastrophic forgetting in continual learning (CL) scenarios, where performance on previously learned tasks degrades severely while training on sequentially arriving tasks. Although pioneering CL approaches using orthogonal subspaces can mitigate task interference, they typically employ fixed budget allocation, neglecting the varying complexity across tasks and layers. Besides, recent budget-adaptive tuning methods for LLMs often adopt multi-stage paradigms that decouple optimization and budget allocation. Such decoupling results in potential misalignment, which hinders those approaches' practical application in CL scenarios. To address these limitations, we propose OA-Adapter, a novel parameter-efficient approach for continual learning in LLMs that unifies dynamic budget adaptation with orthogonal subspace learning in an end-to-end training stage. Specifically, OA-Adapter introduces a dynamic bottleneck dimension adaptation mechanism that simultaneously allocates an efficient parameter budget and optimizes task objectives without misalignment.To effectively preserve previously acquired knowledge while coordinating with the dynamic budget allocation, orthogonal constraints are applied specifically between the parameter subspace of the current task and the dynamically allocated parameter subspaces of historical tasks. Experimental results on continual learning benchmarks demonstrate that OA-Adapter outperforms state-of-the-art methods in both accuracy and parameter efficiency. OA-Adapter achieves higher average accuracy while using 58.5% fewer parameters on the standard CL benchmark, and maintains its advantages on two larger benchmarks comprising 15 tasks.

Paper Structure

This paper contains 41 sections, 11 equations, 7 figures, 11 tables.

Figures (7)

  • Figure 1: The OA-Adapter framework for LLM continual learning. Each task-specific OA-Adapter module (task $t$) comprises three core components: (1) a down-projection layer $\mathcal{W}_1^{(t)}$, (2) a diagonal mask $\Gamma^{(t)}$ with trainable threshold $\tau^{(t)}$, (3) and an up-projection layer $\mathcal{W}_2^{(t)}$. The dynamic masking mechanism enables bidirectional dimension adaptation through activation/deactivation of latent dimensions. Orthogonal subspace constraints are enforced between the column space of the $t$-th task parameters $\mathrm{Col}(\mathcal{W}_2^{(t)})$ and the dynamically allocated parameter subspaces of historical tasks $\mathrm{Col}(\widetilde{\mathcal{W}}_2^{(s)})$ (for $\ s < t$). Here, $\widetilde{\mathcal{W}}_2^{(s)}$ incorporates only the activated dimensions from the $s$-th task.
  • Figure 2: Final dimensions after sequential training following Order-1 with OA-Adapter across four text classification datasets (i.e., DBpedia, Amazon, Yahoo, AG News). The X-axis is the index of T5-large layers, and the Y-axis indicates different layers OA-Adapter applies to.
  • Figure 3: Occurrence and mitigation of catastrophic forgetting during sequential training following Order-1 across multiple tasks.
  • Figure 4: Final dimensions after sequential training following Order-2 with OA-Adapter.
  • Figure 5: Final dimensions after sequential training following Order-3 with OA-Adapter.
  • ...and 2 more figures