Learning to Route for Dynamic Adapter Composition in Continual Learning with Language Models

Vladimir Araujo; Marie-Francine Moens; Tinne Tuytelaars

Learning to Route for Dynamic Adapter Composition in Continual Learning with Language Models

Vladimir Araujo, Marie-Francine Moens, Tinne Tuytelaars

TL;DR

L2R is presented, a method that isolates the training of new PEFT modules to ensure their task specialization and then learns to compose the learned modules by training a network of routers that leverages a small memory containing examples of previously seen tasks.

Abstract

Parameter-efficient fine-tuning (PEFT) methods are increasingly used with pre-trained language models (PLMs) for continual learning (CL). These methods typically involve training a PEFT module for each new task and employing similarity-based selection to route modules during inference. However, they face two major limitations: 1) interference during module training with already learned modules and 2) suboptimal routing when composing modules. In this paper, we present L2R, a method that isolates the training of new PEFT modules to ensure their task specialization. L2R then learns to compose the learned modules by training a network of routers that leverages a small memory containing examples of previously seen tasks. We evaluate our method in two CL setups using various benchmarks. Our results demonstrate that L2R provides an effective composition of PEFT modules, leading to improved generalization and performance compared to other methods.

Learning to Route for Dynamic Adapter Composition in Continual Learning with Language Models

TL;DR

Abstract

Paper Structure (25 sections, 3 figures, 5 tables)

This paper contains 25 sections, 3 figures, 5 tables.

Introduction
Related Work
Continual Learning with PLMs
Local Adaptation
Learning to Route
Method
Task-specific Adapters
Memory
Memory-based Router Learning
Adapter Composition
Experimental Setup
Benchmarks
Baselines
Implementation Details
Results
...and 10 more sections

Figures (3)

Figure 1: Overview of the L2R method. (a) Adapters $A$ attached to the backbone are sequentially trained on a series of tasks $D$. Each adapter undergoes isolated training to prevent interference. (b) Before performing inference, our method utilizes a memory $M$ to learn a routing function $R$, facilitating composition either by 1) computing a weighted average of the adapters' outputs or 2) merging the parameters of the adapters.
Figure 2: L2R-wavg performance across tasks and CL setups. Results for order 3 are shown for MTL5 and AfriSenti, and for order 1 for WOS.
Figure 3: Average router scores for task 2 of MTL5 (order 4), task 3 of WOS (order 1), and task 4 of AfriSenti (order 1) using Gumbel-sigmoid (top) and Softmax (bottom). Scores were computed on the test sets.

Learning to Route for Dynamic Adapter Composition in Continual Learning with Language Models

TL;DR

Abstract

Learning to Route for Dynamic Adapter Composition in Continual Learning with Language Models

Authors

TL;DR

Abstract

Table of Contents

Figures (3)