Table of Contents
Fetching ...

AutoPEFT: Automatic Configuration Search for Parameter-Efficient Fine-Tuning

Han Zhou, Xingchen Wan, Ivan Vulić, Anna Korhonen

TL;DR

AutoPEFT tackles the costly regime of full-model fine-tuning by automatically configuring parameter-efficient fine-tuning (PEFT) through a unified, expressive search space that includes Serial Adapters, Parallel Adapters, and Prefix-Tuning with layer-level insertion decisions. It employs a multi-objective Bayesian optimization framework using a sparse GP surrogate and the NEHVI acquisition to produce a Pareto front of configurations that balance task performance and parameter efficiency, while leveraging low-fidelity evaluations to keep search costs manageable. The approach yields configurations that transfer strongly across GLUE and SuperGLUE tasks and can outperform existing PEFT baselines while remaining competitive with full fine-tuning at a fraction of the parameter updates. This provides a practical, scalable pathway for deploying PEFT on large PLMs with task-specific efficiency trade-offs, backed by public-code availability and extensive ablations.

Abstract

Large pretrained language models are widely used in downstream NLP tasks via task-specific fine-tuning, but such procedures can be costly. Recently, Parameter-Efficient Fine-Tuning (PEFT) methods have achieved strong task performance while updating much fewer parameters than full model fine-tuning (FFT). However, it is non-trivial to make informed design choices on the PEFT configurations, such as their architecture, the number of tunable parameters, and even the layers in which the PEFT modules are inserted. Consequently, it is highly likely that the current, manually designed configurations are suboptimal in terms of their performance-efficiency trade-off. Inspired by advances in neural architecture search, we propose AutoPEFT for automatic PEFT configuration selection: we first design an expressive configuration search space with multiple representative PEFT modules as building blocks. Using multi-objective Bayesian optimisation in a low-cost setup, we then discover a Pareto-optimal set of configurations with strong performance-cost trade-offs across different numbers of parameters that are also highly transferable across different tasks. Empirically, on GLUE and SuperGLUE tasks, we show that AutoPEFT-discovered configurations significantly outperform existing PEFT methods and are on par or better than FFT without incurring substantial training efficiency costs.

AutoPEFT: Automatic Configuration Search for Parameter-Efficient Fine-Tuning

TL;DR

AutoPEFT tackles the costly regime of full-model fine-tuning by automatically configuring parameter-efficient fine-tuning (PEFT) through a unified, expressive search space that includes Serial Adapters, Parallel Adapters, and Prefix-Tuning with layer-level insertion decisions. It employs a multi-objective Bayesian optimization framework using a sparse GP surrogate and the NEHVI acquisition to produce a Pareto front of configurations that balance task performance and parameter efficiency, while leveraging low-fidelity evaluations to keep search costs manageable. The approach yields configurations that transfer strongly across GLUE and SuperGLUE tasks and can outperform existing PEFT baselines while remaining competitive with full fine-tuning at a fraction of the parameter updates. This provides a practical, scalable pathway for deploying PEFT on large PLMs with task-specific efficiency trade-offs, backed by public-code availability and extensive ablations.

Abstract

Large pretrained language models are widely used in downstream NLP tasks via task-specific fine-tuning, but such procedures can be costly. Recently, Parameter-Efficient Fine-Tuning (PEFT) methods have achieved strong task performance while updating much fewer parameters than full model fine-tuning (FFT). However, it is non-trivial to make informed design choices on the PEFT configurations, such as their architecture, the number of tunable parameters, and even the layers in which the PEFT modules are inserted. Consequently, it is highly likely that the current, manually designed configurations are suboptimal in terms of their performance-efficiency trade-off. Inspired by advances in neural architecture search, we propose AutoPEFT for automatic PEFT configuration selection: we first design an expressive configuration search space with multiple representative PEFT modules as building blocks. Using multi-objective Bayesian optimisation in a low-cost setup, we then discover a Pareto-optimal set of configurations with strong performance-cost trade-offs across different numbers of parameters that are also highly transferable across different tasks. Empirically, on GLUE and SuperGLUE tasks, we show that AutoPEFT-discovered configurations significantly outperform existing PEFT methods and are on par or better than FFT without incurring substantial training efficiency costs.
Paper Structure (8 sections, 4 equations, 9 figures, 6 tables, 1 algorithm)

This paper contains 8 sections, 4 equations, 9 figures, 6 tables, 1 algorithm.

Figures (9)

  • Figure 1: Performance of AutoPEFT-discovered configurations (AutoPEFT & AutoPEFT(per-task); see details in Table \ref{['tab:keyresults']}) compared to other baseline PEFT methods (markers) and full model FT that updates 100% of parameters (dashed horizontal bar), averaged across 8 GLUE tasks. Our approach achieves the best trade-off between task performance and parameter efficiency.
  • Figure 2: Illustration of the AutoPEFTsearch space which combines both layer-level (Layers) and within-layer (Serial, Parallel, Prefix) search, and the connections within a layer (Left). We further show two possible configurations in the search space (Right): note that some PEFT layers can be inactive altogether and the searchable module sizes (shaded in green), i.e. the bottleneck sizes in Serial and Parallel ($D_{\text{SA}}$ and $D_{\text{PA}}$ respectively) and sizes of $P_K, P_V$ in Prefix ($L_{\text{PT}}$), are dynamic.
  • Figure 3: Illustration of the Pareto-optimal search with multi-objective Bayesian optimisation (BO; §\ref{['sec:bo_method']}): The BO agent trains on the vector representations of the evaluated configurations as inputs and their performance under a low-fidelity setup (e.g. accuracy -- obtained by fine-tuning the language model with the PEFT configuration for a small number of iterations) and cost (e.g. number of parameters) as targets. The BO agent then iteratively suggests new configurations until convergence.
  • Figure 4: Pareto Fronts of AutoPEFT on four tasks compared to baselines on BERTbase, over varying parameter budgets. We report the single-seed task score but otherwise follow the settings in Table \ref{['tab:keyresults']}.
  • Figure 5: Pairwise transferability study of AutoPEFT-discovered configurations: each row (Ours[task]) denotes the performances of the AutoPEFT configuration searched from [task] (e.g. RTE) to the task itself and 3 other GLUE tasks. The results suggest that AutoPEFT performance is largely robust to the choice of which task to search on.
  • ...and 4 more figures