Table of Contents
Fetching ...

ATLAS: Adapter-Based Multi-Modal Continual Learning with a Two-Stage Learning Strategy

Hong Li, Zhiquan Tan, Xingyu Li, Weiran Huang

TL;DR

An adapter-based two-stage learning paradigm, a multi-modal continual learning scheme that consists of experience-based learning and novel knowledge expansion, which helps the model fully use experience knowledge and compensate for novel knowledge is proposed.

Abstract

While vision-and-language models significantly advance in many fields, the challenge of continual learning is unsolved. Parameter-efficient modules like adapters and prompts present a promising way to alleviate catastrophic forgetting. However, existing works usually learn individual adapters for each task, which may result in redundant knowledge among adapters. Moreover, they continue to use the original pre-trained model to initialize the downstream model, leading to negligible changes in the model's generalization compared to the original model. In addition, there is still a lack of research investigating the consequences of integrating a multi-modal model into the updating procedure for both uni-modal and multi-modal tasks and the subsequent impacts it has on downstream tasks. In this paper, we propose an adapter-based two-stage learning paradigm, a multi-modal continual learning scheme that consists of experience-based learning and novel knowledge expansion, which helps the model fully use experience knowledge and compensate for novel knowledge. Extensive experiments demonstrate that our method is proficient for continual learning. It expands the distribution of representation upstream while also minimizing the negative impact of forgetting previous tasks. Additionally, it enhances the generalization capability for downstream tasks. Furthermore, we incorporate both multi-modal and uni-modal tasks into upstream continual learning. We observe that learning from upstream tasks can help with downstream tasks. Our code will be available at: https://github.com/lihong2303/ATLAS.

ATLAS: Adapter-Based Multi-Modal Continual Learning with a Two-Stage Learning Strategy

TL;DR

An adapter-based two-stage learning paradigm, a multi-modal continual learning scheme that consists of experience-based learning and novel knowledge expansion, which helps the model fully use experience knowledge and compensate for novel knowledge is proposed.

Abstract

While vision-and-language models significantly advance in many fields, the challenge of continual learning is unsolved. Parameter-efficient modules like adapters and prompts present a promising way to alleviate catastrophic forgetting. However, existing works usually learn individual adapters for each task, which may result in redundant knowledge among adapters. Moreover, they continue to use the original pre-trained model to initialize the downstream model, leading to negligible changes in the model's generalization compared to the original model. In addition, there is still a lack of research investigating the consequences of integrating a multi-modal model into the updating procedure for both uni-modal and multi-modal tasks and the subsequent impacts it has on downstream tasks. In this paper, we propose an adapter-based two-stage learning paradigm, a multi-modal continual learning scheme that consists of experience-based learning and novel knowledge expansion, which helps the model fully use experience knowledge and compensate for novel knowledge. Extensive experiments demonstrate that our method is proficient for continual learning. It expands the distribution of representation upstream while also minimizing the negative impact of forgetting previous tasks. Additionally, it enhances the generalization capability for downstream tasks. Furthermore, we incorporate both multi-modal and uni-modal tasks into upstream continual learning. We observe that learning from upstream tasks can help with downstream tasks. Our code will be available at: https://github.com/lihong2303/ATLAS.

Paper Structure

This paper contains 27 sections, 10 equations, 6 figures, 5 tables, 1 algorithm.

Figures (6)

  • Figure 1: Prior prompt-based work involves a pool of key-value pairs used to select learnable prompts inserted in the model for learning instruction. Adapter-based work entails the individual learning of adapter modules for each task. Our methods propose a two-stage learning paradigm, which is a knowledge-incremental approach to constantly expanding knowledge when learning sequential tasks. We train stage2 in the subsequent epochs after stage1 is completed.
  • Figure 2: Schematic Figure of the two-stage learning paradigm. For the $t$-th task, experience-based learning optimizes the parameters of Query to control the degree of involvement of previously saved knowledge in the new task. Based on Novel Knowledge Expansion, we compensate for the knowledge not exist in previously saved knowledge to ensure the full exploration of new task knowledge. The meaning of the modules' color is similar to Figure \ref{['fig:fig-teaser']}.
  • Figure 3: Learning process of upstream continuous learning. Linear Probing is the lower bound of continual learning while Fine-tuning serves as the upper bound. We estimate experience-based learning before novel knowledge expansion in each task, and estimate forgetting afterward.
  • Figure 4: Accuracy and knowledge coefficient comparison of experience-based learning at each step of upstream continual learning on the VCR zellers2019recognition and Places365 lopez2020semantic datasets. The length of each bar is a knowledge coefficient corresponding to each adapter.
  • Figure 5: Learning process of upstream continuous learning with Uni-First order (PIQA$\rightarrow$SNLI-VE$\rightarrow$VQAv2$\rightarrow$iNaturalist2019$\rightarrow$SST-2). Linear Probing is the lower bound of continual learning while Fine-tuning serves as the upper bound. We estimate experience-based learning before novel knowledge expansion in each task, and estimate forgetting afterward.
  • ...and 1 more figures