Table of Contents
Fetching ...

Assessing the Portability of Parameter Matrices Trained by Parameter-Efficient Finetuning Methods

Mohammed Sabry, Anya Belz

TL;DR

This work addresses whether task-specific knowledge encoded in parameter-efficient finetuning (PEFT) modules can be ported across different pretrained hosts. It empirically evaluates four PEFT techniques across multiple origin/destination model pairs and sentiment-analysis datasets, using a large grid of experimental conditions (1,440 runs plus a from-scratch baseline). The results show that ported PEFT modules generally outperform baselines and exhibit substantial zero-shot portability, with portability strength varying by PEFT type and porting direction; Adapters tend to offer the strongest cross-model transfer. These findings demonstrate functional modularity of PEFT components and point toward design principles for portable PEFT methods with potential efficiency gains in real-world deployment.

Abstract

As the cost of training ever larger language models has grown, so has the interest in reusing previously learnt knowledge. Transfer learning methods have shown how reusing non-task-specific knowledge can help in subsequent task-specific learning. In this paper, we investigate the inverse: porting whole functional modules that encode task-specific knowledge from one model to another. We designed a study comprising 1,440 training/testing runs to test the portability of modules trained by parameter-efficient finetuning (PEFT) techniques, using sentiment analysis as an example task. We test portability in a wide range of scenarios, involving different PEFT techniques and different pretrained host models, among other dimensions. We compare the performance of ported modules with that of equivalent modules trained (i) from scratch, and (ii) from parameters sampled from the same distribution as the ported module. We find that the ported modules far outperform the two alternatives tested, but that there are interesting performance differences between the four PEFT techniques. We conclude that task-specific knowledge in the form of structurally modular sets of parameters as produced by PEFT techniques is highly portable, but that degree of success depends on type of PEFT and on differences between originating and receiving pretrained models.

Assessing the Portability of Parameter Matrices Trained by Parameter-Efficient Finetuning Methods

TL;DR

This work addresses whether task-specific knowledge encoded in parameter-efficient finetuning (PEFT) modules can be ported across different pretrained hosts. It empirically evaluates four PEFT techniques across multiple origin/destination model pairs and sentiment-analysis datasets, using a large grid of experimental conditions (1,440 runs plus a from-scratch baseline). The results show that ported PEFT modules generally outperform baselines and exhibit substantial zero-shot portability, with portability strength varying by PEFT type and porting direction; Adapters tend to offer the strongest cross-model transfer. These findings demonstrate functional modularity of PEFT components and point toward design principles for portable PEFT methods with potential efficiency gains in real-world deployment.

Abstract

As the cost of training ever larger language models has grown, so has the interest in reusing previously learnt knowledge. Transfer learning methods have shown how reusing non-task-specific knowledge can help in subsequent task-specific learning. In this paper, we investigate the inverse: porting whole functional modules that encode task-specific knowledge from one model to another. We designed a study comprising 1,440 training/testing runs to test the portability of modules trained by parameter-efficient finetuning (PEFT) techniques, using sentiment analysis as an example task. We test portability in a wide range of scenarios, involving different PEFT techniques and different pretrained host models, among other dimensions. We compare the performance of ported modules with that of equivalent modules trained (i) from scratch, and (ii) from parameters sampled from the same distribution as the ported module. We find that the ported modules far outperform the two alternatives tested, but that there are interesting performance differences between the four PEFT techniques. We conclude that task-specific knowledge in the form of structurally modular sets of parameters as produced by PEFT techniques is highly portable, but that degree of success depends on type of PEFT and on differences between originating and receiving pretrained models.
Paper Structure (9 sections, 2 figures, 3 tables)

This paper contains 9 sections, 2 figures, 3 tables.

Figures (2)

  • Figure 1: Each bar chart shows average accuracy over three random seeds and two pairs of originating and receiving models for one PEFT technique (e.g. Adapter), one porting direction (e.g. raw $\rightarrow$ instruction-tuned), and one number of pre-porting training learning steps (e.g. 5K). Y-axis in each chart is Accuracy, X-axis is the number of post-porting adaptation learning steps (500, 1.5K and 3K), blue=ported, orange=sampled, and green=random parameters.
  • Figure 2: Each bar chart shows average accuracy over three random seeds and two pairs of originating and receiving models for one PEFT technique (e.g. Adapter), one porting direction (e.g. raw $\rightarrow$ instruction-tuned), and one number of preporting training learning steps (e.g. 5K). Y-axis in each chart is Accuracy, x-axis is number of post-porting adaptation learning steps (500, 1.5K and 3K), blue=ported, orange=sampled, green=random parameters.