Lifelong Learning with Task-Specific Adaptation: Addressing the Stability-Plasticity Dilemma
Ruiyu Wang, Sen Wang, Xinxin Zuo, Qiang Sun
TL;DR
AdaLL introduces a simple, universal adapter-based framework for lifelong learning that co-trains a backbone with task-specific adapters under regularization to separate invariant feature learning from task-specific adaptation. By enforcing backbone regularization and employing adapter bottlenecks, AdaLL tackles the stability-plasticity dilemma without freezing the backbone, enabling incremental learning across multiple tasks with improved retention and adaptation. The approach integrates with existing IL methods (e.g., EWC, LwF, DualPrompt) and demonstrates consistent gains on CIFAR-100 and ImageNet-subset across diverse task orders and architectures, while remaining memory-efficient relative to gradient-subspace methods. These findings highlight AdaLL’s practical impact for scalable, architecture-agnostic continual learning that can leverage standard regularization techniques and adapters to improve both stability and plasticity in dynamic environments.
Abstract
Lifelong learning (LL) aims to continuously acquire new knowledge while retaining previously learned knowledge. A central challenge in LL is the stability-plasticity dilemma, which requires models to balance the preservation of previous knowledge (stability) with the ability to learn new tasks (plasticity). While parameter-efficient fine-tuning (PEFT) has been widely adopted in large language models, its application to lifelong learning remains underexplored. To bridge this gap, this paper proposes AdaLL, an adapter-based framework designed to address the dilemma through a simple, universal, and effective strategy. AdaLL co-trains the backbone network and adapters under regularization constraints, enabling the backbone to capture task-invariant features while allowing the adapters to specialize in task-specific information. Unlike methods that freeze the backbone network, AdaLL incrementally enhances the backbone's capabilities across tasks while minimizing interference through backbone regularization. This architectural design significantly improves both stability and plasticity, effectively eliminating the stability-plasticity dilemma. Extensive experiments demonstrate that AdaLL consistently outperforms existing methods across various configurations, including dataset choices, task sequences, and task scales.
