Table of Contents
Fetching ...

Linked Adapters: Linking Past and Future to Present for Effective Continual Learning

Dupati Srikar Chandra, P. K. Srijith, Dana Rezazadegan, Chris McCarthy

TL;DR

This work tackles catastrophic forgetting in continual learning with pre-trained vision transformers by introducing Linked Adapters, which connect task-specific adapters via an MLP-predicted, weighted attention mechanism. The MLP $f_h(\mathbf{e}^p,\mathbf{e}^t;\Theta_h)$ generates forward attention $\beta^{pt}$ during training and, at inference, weights from subsequent tasks $\beta^{ts}$ to current tasks, enabling both forward and backward knowledge transfer without retraining the backbone. Empirically, AdaLink variants outperform strong baselines on Split-CIFAR-100, CUB200, and Imagenet-R, with measurable improvements in knowledge transfer and minimal overhead from the small MLP. The approach offers a scalable path to cross-task knowledge sharing in transformer-based continual learning, with potential extensions to NLP and multimodal settings. Overall, Linked Adapters demonstrate that structured, learned lateral connections can significantly mitigate forgetting while leveraging information across tasks.

Abstract

Continual learning allows the system to learn and adapt to new tasks while retaining the knowledge acquired from previous tasks. However, deep learning models suffer from catastrophic forgetting of knowledge learned from earlier tasks while learning a new task. Moreover, retraining large models like transformers from scratch for every new task is costly. An effective approach to address continual learning is to use a large pre-trained model with task-specific adapters to adapt to the new tasks. Though this approach can mitigate catastrophic forgetting, they fail to transfer knowledge across tasks as each task is learning adapters separately. To address this, we propose a novel approach Linked Adapters that allows knowledge transfer through a weighted attention mechanism to other task-specific adapters. Linked adapters use a multi-layer perceptron (MLP) to model the attention weights, which overcomes the challenge of backward knowledge transfer in continual learning in addition to modeling the forward knowledge transfer. During inference, our proposed approach effectively leverages knowledge transfer through MLP-based attention weights across all the lateral task adapters. Through numerous experiments conducted on diverse image classification datasets, we effectively demonstrated the improvement in performance on the continual learning tasks using Linked Adapters.

Linked Adapters: Linking Past and Future to Present for Effective Continual Learning

TL;DR

This work tackles catastrophic forgetting in continual learning with pre-trained vision transformers by introducing Linked Adapters, which connect task-specific adapters via an MLP-predicted, weighted attention mechanism. The MLP generates forward attention during training and, at inference, weights from subsequent tasks to current tasks, enabling both forward and backward knowledge transfer without retraining the backbone. Empirically, AdaLink variants outperform strong baselines on Split-CIFAR-100, CUB200, and Imagenet-R, with measurable improvements in knowledge transfer and minimal overhead from the small MLP. The approach offers a scalable path to cross-task knowledge sharing in transformer-based continual learning, with potential extensions to NLP and multimodal settings. Overall, Linked Adapters demonstrate that structured, learned lateral connections can significantly mitigate forgetting while leveraging information across tasks.

Abstract

Continual learning allows the system to learn and adapt to new tasks while retaining the knowledge acquired from previous tasks. However, deep learning models suffer from catastrophic forgetting of knowledge learned from earlier tasks while learning a new task. Moreover, retraining large models like transformers from scratch for every new task is costly. An effective approach to address continual learning is to use a large pre-trained model with task-specific adapters to adapt to the new tasks. Though this approach can mitigate catastrophic forgetting, they fail to transfer knowledge across tasks as each task is learning adapters separately. To address this, we propose a novel approach Linked Adapters that allows knowledge transfer through a weighted attention mechanism to other task-specific adapters. Linked adapters use a multi-layer perceptron (MLP) to model the attention weights, which overcomes the challenge of backward knowledge transfer in continual learning in addition to modeling the forward knowledge transfer. During inference, our proposed approach effectively leverages knowledge transfer through MLP-based attention weights across all the lateral task adapters. Through numerous experiments conducted on diverse image classification datasets, we effectively demonstrated the improvement in performance on the continual learning tasks using Linked Adapters.

Paper Structure

This paper contains 13 sections, 7 equations, 3 figures, 2 tables.

Figures (3)

  • Figure 1: Figure \ref{['fig:architecture']}.a. demonstrates Linked Adapters during training where lateral connections from previous task adapter representations are connected to current task $t$ to enable knowledge transfer and their corresponding attention weights are being generated by MLP . Figure \ref{['fig:architecture']}.b. demonstrates Linked Adapters during testing where the lateral connections from the adapter representations from previous and subsequent tasks are added to current task $t$ enable knowledge transfer from both directions where MLP generates attention weights of subsequent tasks without any extra training.
  • Figure 2: Comparison between baselines and AdaLink over average test accuracy of individual tasks. In the above figure, on the X-axis task numbers are mentioned and, on the Y-axis, the average test accuracy of individual tasks is presented.
  • Figure 3: Comparison between AdaLink with MLP and AdaLink with constant attention weights (AdaLink-Forward-k and AdaLink-Bidirectional-k) over average test accuracy of individual tasks on Cub 200 dataset with 10 tasks.