Adaptive Budget Allocation for Orthogonal-Subspace Adapter Tuning in LLMs Continual Learning
Zhiyi Wan, Wanrou Du, Liang Li, Miao Pan, Xiaoqi Qin
TL;DR
The paper tackles catastrophic forgetting in continual learning for large language models by addressing the inefficiencies of fixed-budget orthogonal subspace methods and the overhead of multi-stage budget-tuning. It introduces OA-Adapter, a parameter-efficient framework that jointly learns dynamic budget adaptation and orthogonal task subspaces in an end-to-end training process, using a trainable diagonal mask to modulate bottleneck dimensions. Orthogonality constraints ensure current task updates remain independent from past tasks, preserving knowledge while allowing task-specific optimization. Empirical results on multiple CL benchmarks show OA-Adapter delivers higher average accuracy with substantially fewer parameters than baselines, and analyses reveal heterogeneous budget needs across tasks and layers. The work establishes a new paradigm for budget-aware continual learning in LLMs and demonstrates compatibility with replay strategies, offering scalable improvements for real-world deployment.
Abstract
Large language models (LLMs) often suffer from catastrophic forgetting in continual learning (CL) scenarios, where performance on previously learned tasks degrades severely while training on sequentially arriving tasks. Although pioneering CL approaches using orthogonal subspaces can mitigate task interference, they typically employ fixed budget allocation, neglecting the varying complexity across tasks and layers. Besides, recent budget-adaptive tuning methods for LLMs often adopt multi-stage paradigms that decouple optimization and budget allocation. Such decoupling results in potential misalignment, which hinders those approaches' practical application in CL scenarios. To address these limitations, we propose OA-Adapter, a novel parameter-efficient approach for continual learning in LLMs that unifies dynamic budget adaptation with orthogonal subspace learning in an end-to-end training stage. Specifically, OA-Adapter introduces a dynamic bottleneck dimension adaptation mechanism that simultaneously allocates an efficient parameter budget and optimizes task objectives without misalignment.To effectively preserve previously acquired knowledge while coordinating with the dynamic budget allocation, orthogonal constraints are applied specifically between the parameter subspace of the current task and the dynamically allocated parameter subspaces of historical tasks. Experimental results on continual learning benchmarks demonstrate that OA-Adapter outperforms state-of-the-art methods in both accuracy and parameter efficiency. OA-Adapter achieves higher average accuracy while using 58.5% fewer parameters on the standard CL benchmark, and maintains its advantages on two larger benchmarks comprising 15 tasks.
