Linked Adapters: Linking Past and Future to Present for Effective Continual Learning
Dupati Srikar Chandra, P. K. Srijith, Dana Rezazadegan, Chris McCarthy
TL;DR
This work tackles catastrophic forgetting in continual learning with pre-trained vision transformers by introducing Linked Adapters, which connect task-specific adapters via an MLP-predicted, weighted attention mechanism. The MLP $f_h(\mathbf{e}^p,\mathbf{e}^t;\Theta_h)$ generates forward attention $\beta^{pt}$ during training and, at inference, weights from subsequent tasks $\beta^{ts}$ to current tasks, enabling both forward and backward knowledge transfer without retraining the backbone. Empirically, AdaLink variants outperform strong baselines on Split-CIFAR-100, CUB200, and Imagenet-R, with measurable improvements in knowledge transfer and minimal overhead from the small MLP. The approach offers a scalable path to cross-task knowledge sharing in transformer-based continual learning, with potential extensions to NLP and multimodal settings. Overall, Linked Adapters demonstrate that structured, learned lateral connections can significantly mitigate forgetting while leveraging information across tasks.
Abstract
Continual learning allows the system to learn and adapt to new tasks while retaining the knowledge acquired from previous tasks. However, deep learning models suffer from catastrophic forgetting of knowledge learned from earlier tasks while learning a new task. Moreover, retraining large models like transformers from scratch for every new task is costly. An effective approach to address continual learning is to use a large pre-trained model with task-specific adapters to adapt to the new tasks. Though this approach can mitigate catastrophic forgetting, they fail to transfer knowledge across tasks as each task is learning adapters separately. To address this, we propose a novel approach Linked Adapters that allows knowledge transfer through a weighted attention mechanism to other task-specific adapters. Linked adapters use a multi-layer perceptron (MLP) to model the attention weights, which overcomes the challenge of backward knowledge transfer in continual learning in addition to modeling the forward knowledge transfer. During inference, our proposed approach effectively leverages knowledge transfer through MLP-based attention weights across all the lateral task adapters. Through numerous experiments conducted on diverse image classification datasets, we effectively demonstrated the improvement in performance on the continual learning tasks using Linked Adapters.
