Table of Contents
Fetching ...

X-PEFT: eXtremely Parameter-Efficient Fine-Tuning for Extreme Multi-Profile Scenarios

Namju Kwak, Taesup Kim

TL;DR

X-PEFT tackles extreme multi-profile NLP by dramatically reducing per-profile parameters and memory through learnable mask tensors that selectively compose a large pool of pre-trained adapters. It introduces soft-masked and hard-masked variants to fuse adapters without training new ones, and demonstrates strong performance on LaMP, GLUE, and SuperGLUE using both trained and random adapters, with memory reductions up to $10^4\times$ and parameter reductions around $10^2\times$. By framing the adapter selection as an adapter-level supermask problem, the approach aligns with the Lottery Ticket Hypothesis, showing that even random adapters can yield competitive results when masked appropriately. The proposed framework enables scalable, multi-profile NLP deployments with minimal per-profile storage, facilitating practical service at scale while maintaining high task performance.

Abstract

Parameter-efficient fine-tuning (PEFT) techniques, such as adapter tuning, aim to fine-tune a pre-trained language model (PLM) using a minimal number of parameters for a specific task or profile. Although adapter tuning provides increased parameter efficiency compared to full-model fine-tuning, it introduces a small set of additional parameters attached to a PLM for each profile. This can become problematic in practical applications with multiple profiles, particularly when a significant increase in the number of profiles linearly boosts the total number of additional parameters. To mitigate this issue, we introduce X-PEFT, a novel PEFT method that leverages a multitude of given adapters by fine-tuning an extremely small set of compact tensors for a new profile, which serve as binary masks to adaptively select the given adapters. To efficiently validate our proposed method, we implement it using a large number of trained or untrained (random) adapters. We evaluate the performance of X-PEFT through LaMP and GLUE tasks and demonstrate that it either matches or surpasses the effectiveness of conventional adapter tuning, despite reducing the memory requirements per profile by a factor of 10,000 compared to it.

X-PEFT: eXtremely Parameter-Efficient Fine-Tuning for Extreme Multi-Profile Scenarios

TL;DR

X-PEFT tackles extreme multi-profile NLP by dramatically reducing per-profile parameters and memory through learnable mask tensors that selectively compose a large pool of pre-trained adapters. It introduces soft-masked and hard-masked variants to fuse adapters without training new ones, and demonstrates strong performance on LaMP, GLUE, and SuperGLUE using both trained and random adapters, with memory reductions up to and parameter reductions around . By framing the adapter selection as an adapter-level supermask problem, the approach aligns with the Lottery Ticket Hypothesis, showing that even random adapters can yield competitive results when masked appropriately. The proposed framework enables scalable, multi-profile NLP deployments with minimal per-profile storage, facilitating practical service at scale while maintaining high task performance.

Abstract

Parameter-efficient fine-tuning (PEFT) techniques, such as adapter tuning, aim to fine-tune a pre-trained language model (PLM) using a minimal number of parameters for a specific task or profile. Although adapter tuning provides increased parameter efficiency compared to full-model fine-tuning, it introduces a small set of additional parameters attached to a PLM for each profile. This can become problematic in practical applications with multiple profiles, particularly when a significant increase in the number of profiles linearly boosts the total number of additional parameters. To mitigate this issue, we introduce X-PEFT, a novel PEFT method that leverages a multitude of given adapters by fine-tuning an extremely small set of compact tensors for a new profile, which serve as binary masks to adaptively select the given adapters. To efficiently validate our proposed method, we implement it using a large number of trained or untrained (random) adapters. We evaluate the performance of X-PEFT through LaMP and GLUE tasks and demonstrate that it either matches or surpasses the effectiveness of conventional adapter tuning, despite reducing the memory requirements per profile by a factor of 10,000 compared to it.
Paper Structure (28 sections, 1 equation, 7 figures, 9 tables, 1 algorithm)

This paper contains 28 sections, 1 equation, 7 figures, 9 tables, 1 algorithm.

Figures (7)

  • Figure 1: Demonstrating the remarkable parameter efficiency in terms of memory requirements of X-PEFT in extreme multi-profile scenarios. Additional details can be found in Section \ref{['sec:lamp_exp']}.
  • Figure 2: Illustration of our proposed method, X-PEFT. Additional details can be found in Section \ref{['sec:method']}.
  • Figure 3: Visualization of mask tensors with t-SNE. Each point represents an author/profile, and the color and size of it represent the majority category assigned by each author and the majority ratio in an article. This shows how the mask tensors effectively capture the categorization diversity among authors.
  • Figure 4: Evaluation of the Modified LaMP 'Personalized News Categorization' Dataset. Averaged evaluation accuracy and F1 score over 323 authors are presented (on 30% holdout sets).
  • Figure 5: Training curves for sst2 with various settings. (a) Varying the number of adapters and comparing soft / hard masks: more adapters lead to improved loss, and soft masks generally show lower loss than hard ones. (b) Effectiveness of separate mask tensors: the impact of having $M_A$ and $M_B$ is evident. (c) Varying $k$ for hard masks: $k = 50$ consistently shows best performance irrespective of the specific value of $N$.
  • ...and 2 more figures