AutoPEFT: Automatic Configuration Search for Parameter-Efficient Fine-Tuning
Han Zhou, Xingchen Wan, Ivan Vulić, Anna Korhonen
TL;DR
AutoPEFT tackles the costly regime of full-model fine-tuning by automatically configuring parameter-efficient fine-tuning (PEFT) through a unified, expressive search space that includes Serial Adapters, Parallel Adapters, and Prefix-Tuning with layer-level insertion decisions. It employs a multi-objective Bayesian optimization framework using a sparse GP surrogate and the NEHVI acquisition to produce a Pareto front of configurations that balance task performance and parameter efficiency, while leveraging low-fidelity evaluations to keep search costs manageable. The approach yields configurations that transfer strongly across GLUE and SuperGLUE tasks and can outperform existing PEFT baselines while remaining competitive with full fine-tuning at a fraction of the parameter updates. This provides a practical, scalable pathway for deploying PEFT on large PLMs with task-specific efficiency trade-offs, backed by public-code availability and extensive ablations.
Abstract
Large pretrained language models are widely used in downstream NLP tasks via task-specific fine-tuning, but such procedures can be costly. Recently, Parameter-Efficient Fine-Tuning (PEFT) methods have achieved strong task performance while updating much fewer parameters than full model fine-tuning (FFT). However, it is non-trivial to make informed design choices on the PEFT configurations, such as their architecture, the number of tunable parameters, and even the layers in which the PEFT modules are inserted. Consequently, it is highly likely that the current, manually designed configurations are suboptimal in terms of their performance-efficiency trade-off. Inspired by advances in neural architecture search, we propose AutoPEFT for automatic PEFT configuration selection: we first design an expressive configuration search space with multiple representative PEFT modules as building blocks. Using multi-objective Bayesian optimisation in a low-cost setup, we then discover a Pareto-optimal set of configurations with strong performance-cost trade-offs across different numbers of parameters that are also highly transferable across different tasks. Empirically, on GLUE and SuperGLUE tasks, we show that AutoPEFT-discovered configurations significantly outperform existing PEFT methods and are on par or better than FFT without incurring substantial training efficiency costs.
