Table of Contents
Fetching ...

SwitchCIT: Switching for Continual Instruction Tuning

Xinbo Wu, Max Hartman, Vidhata Arjun Jayaraman, Lav R. Varshney

TL;DR

SwitchCIT tackles catastrophic forgetting in continual instruction tuning by introducing a switch network that classifies instructions into tasks and routes queries to per-task, parameter-efficient models tuned with LoRA. This architectural separation reduces interference between tasks, enabling continual learning without heavy regularization and with low data retention needs. Empirical results across five natural language generation tasks and two vision-language tasks show improved retention and competitive or superior performance compared with baselines, including rehearsal, even with small data fractions and on larger base models. The approach offers practical advantages in efficiency, scalability, portability, and privacy, suggesting a viable path for scalable continual instruction learning in large multimodal systems.

Abstract

Large language models (LLMs) and multimodal models (MMs) have exhibited impressive capabilities in various domains, particularly in general language understanding and visual reasoning. However, these models, trained on massive data, may not be finely optimized for specific tasks triggered by instructions. Continual instruction tuning is crucial to adapt a large model to evolving tasks and domains, ensuring their effectiveness and relevance across a wide range of applications. In the context of continual instruction tuning, where models are sequentially trained on different tasks, catastrophic forgetting can occur, leading to performance degradation on previously learned tasks. This work addresses the catastrophic forgetting in continual instruction learning through a switching mechanism for routing computations to parameter-efficient tuned models. We demonstrate the effectiveness of our method through experiments on continual instruction tuning of different natural language generation tasks and vision-language tasks. We also showcase the advantages of our proposed method in terms of efficiency, scalability, portability, and privacy preservation.

SwitchCIT: Switching for Continual Instruction Tuning

TL;DR

SwitchCIT tackles catastrophic forgetting in continual instruction tuning by introducing a switch network that classifies instructions into tasks and routes queries to per-task, parameter-efficient models tuned with LoRA. This architectural separation reduces interference between tasks, enabling continual learning without heavy regularization and with low data retention needs. Empirical results across five natural language generation tasks and two vision-language tasks show improved retention and competitive or superior performance compared with baselines, including rehearsal, even with small data fractions and on larger base models. The approach offers practical advantages in efficiency, scalability, portability, and privacy, suggesting a viable path for scalable continual instruction learning in large multimodal systems.

Abstract

Large language models (LLMs) and multimodal models (MMs) have exhibited impressive capabilities in various domains, particularly in general language understanding and visual reasoning. However, these models, trained on massive data, may not be finely optimized for specific tasks triggered by instructions. Continual instruction tuning is crucial to adapt a large model to evolving tasks and domains, ensuring their effectiveness and relevance across a wide range of applications. In the context of continual instruction tuning, where models are sequentially trained on different tasks, catastrophic forgetting can occur, leading to performance degradation on previously learned tasks. This work addresses the catastrophic forgetting in continual instruction learning through a switching mechanism for routing computations to parameter-efficient tuned models. We demonstrate the effectiveness of our method through experiments on continual instruction tuning of different natural language generation tasks and vision-language tasks. We also showcase the advantages of our proposed method in terms of efficiency, scalability, portability, and privacy preservation.
Paper Structure (20 sections, 2 equations, 3 figures, 9 tables)

This paper contains 20 sections, 2 equations, 3 figures, 9 tables.

Figures (3)

  • Figure 1: The inference procedure of SwitchCIT. The instruction is first fed into a switch network consisting of a lightweight LLM and a task classifier. The last token representation from the final layer of the LLM is used as the input to the task classifier. This switch network classifies a given task and then routes computation to the associated set of parameters.
  • Figure 2: Progressive relative gain of various methods. The horizontal axis presents different learning stages labeled by their respective task names, whereas the vertical axis shows the relative gain. Task performances are shown once the task is learned across different stages.
  • Figure 3: Progressive relative gain of various methods. The horizontal axis presents different learning stages labeled by their respective task names, whereas the vertical axis shows the relative gain. Task performances are shown once the task is learned across different stages.