Table of Contents
Fetching ...

Composing Parameter-Efficient Modules with Arithmetic Operations

Jinghan Zhang, Shiqi Chen, Junteng Liu, Junxian He

TL;DR

The paper presents a training-free framework for composing parameter-efficient modules (PEMs) in the weight space using simple arithmetic operators, specifically addition and negation, to merge, unlearn, and transfer skills across distributions, tasks, and domains. By applying these operators to LoRA and (IA)^3 PEMs, the approach yields new PEMs that can outperform individual modules and enable efficient, modular adaptation of pretrained models, including detoxification of instruction-tuned LLMs. Extensive experiments across distribution generalization, multi-tasking, unlearning, and domain transfer demonstrate the efficacy and flexibility of arithmetic PEM composition, with extensions to LLM instruction tuning. The work highlights the potential of training-free PEM composition for scalable, modular NLP systems, while noting limitations related to initialization sensitivity and the need for hyperparameter tuning.

Abstract

As an efficient alternative to conventional full finetuning, parameter-efficient finetuning (PEFT) is becoming the prevailing method to adapt pretrained language models. In PEFT, a lightweight module is learned on each dataset while the underlying pretrained language model remains unchanged, resulting in multiple compact modules representing diverse skills when applied to various domains and tasks. In this paper, we propose to compose these parameter-efficient modules through linear arithmetic operations in the weight space, thereby integrating different module capabilities. Specifically, we first define addition and negation operators for the module, and then further compose these two basic operators to perform flexible arithmetic. Our approach requires \emph{no additional training} and enables highly flexible module composition. We apply different arithmetic operations to compose the parameter-efficient modules for (1) distribution generalization, (2) multi-tasking, (3) unlearning, and (4) domain transfer. Additionally, we extend our approach to detoxify Alpaca-LoRA, the latest instruction-tuned large language model based on LLaMA. Empirical results demonstrate that our approach produces new and effective parameter-efficient modules that significantly outperform existing ones across all settings.

Composing Parameter-Efficient Modules with Arithmetic Operations

TL;DR

The paper presents a training-free framework for composing parameter-efficient modules (PEMs) in the weight space using simple arithmetic operators, specifically addition and negation, to merge, unlearn, and transfer skills across distributions, tasks, and domains. By applying these operators to LoRA and (IA)^3 PEMs, the approach yields new PEMs that can outperform individual modules and enable efficient, modular adaptation of pretrained models, including detoxification of instruction-tuned LLMs. Extensive experiments across distribution generalization, multi-tasking, unlearning, and domain transfer demonstrate the efficacy and flexibility of arithmetic PEM composition, with extensions to LLM instruction tuning. The work highlights the potential of training-free PEM composition for scalable, modular NLP systems, while noting limitations related to initialization sensitivity and the need for hyperparameter tuning.

Abstract

As an efficient alternative to conventional full finetuning, parameter-efficient finetuning (PEFT) is becoming the prevailing method to adapt pretrained language models. In PEFT, a lightweight module is learned on each dataset while the underlying pretrained language model remains unchanged, resulting in multiple compact modules representing diverse skills when applied to various domains and tasks. In this paper, we propose to compose these parameter-efficient modules through linear arithmetic operations in the weight space, thereby integrating different module capabilities. Specifically, we first define addition and negation operators for the module, and then further compose these two basic operators to perform flexible arithmetic. Our approach requires \emph{no additional training} and enables highly flexible module composition. We apply different arithmetic operations to compose the parameter-efficient modules for (1) distribution generalization, (2) multi-tasking, (3) unlearning, and (4) domain transfer. Additionally, we extend our approach to detoxify Alpaca-LoRA, the latest instruction-tuned large language model based on LLaMA. Empirical results demonstrate that our approach produces new and effective parameter-efficient modules that significantly outperform existing ones across all settings.
Paper Structure (43 sections, 8 equations, 17 figures, 12 tables)

This paper contains 43 sections, 8 equations, 17 figures, 12 tables.

Figures (17)

  • Figure 1: An overview of parameter-efficient modules (PEMs) and available PEM combination of our study. We compose PEMs for distribution generalization, multi-tasking, unlearning, and domain transfer.
  • Figure 2: The change of MNLI and RTE validation accuracy with different coefficient $\lambda$ value for the merged LoRA. By $\lambda=0 / \lambda=1$ we obtained the original RTE / MNLI LoRA.
  • Figure 3: Performance of T5-base and T5-small LoRA combination with same and different initialization on Yelp and Amazon, in the domain transfer setting. The subfigures from left to right are T5-base on Yelp, T5-small on Yelp, T5-base on Amazon and T5-small on Amazon.
  • Figure 4: An example for inserting prompt to MNLI and RTE samples.
  • Figure 5: Performance of FFT, LoRA, (IA)$^3$ with RoBERTa-base tuned on different distribution as in § \ref{['sec:datasetsplit']} when varying $\lambda$. The subfigures from left to right and from top to bottom are CoLA, MNLI, MRPC, QNLI, QQP, RTE, SST-2, STS-B.
  • ...and 12 more figures