Table of Contents
Fetching ...

Look Within or Look Beyond? A Theoretical Comparison Between Parameter-Efficient and Full Fine-Tuning

Yongkang Liu, Xingle Xu, Ercong Nie, Zijing Wang, Shi Feng, Daling Wang, Qian Li, Hinrich Schütze

TL;DR

This work theoretically and empirically analyzes the trade-offs between Parameter-Efficient Fine-Tuning (PEFT) and Full Fine-Tuning (FFT). It proves PEFT occupies a strict, low-dimensional subspace of the FFT parameter space and derives an upper bound on its representational capacity, along with a diminishing return on additional parameters, heightened sensitivity to perturbations, and smaller data-driven gains. Across 15 datasets and 11 adversarial test sets, FFT consistently outperforms PEFT on complex tasks and exhibits stronger robustness, while PEFT can match or exceed FFT on simpler, data-scarce scenarios. The findings suggest FFT as the generally more reliable choice when resources permit, with PEFT offering benefits primarily in low-data or resource-constrained settings, and they highlight theoretical avenues for improving PEFT reliability and performance.

Abstract

Parameter-Efficient Fine-Tuning (PEFT) methods achieve performance comparable to Full Fine-Tuning (FFT) while requiring significantly fewer computing resources, making it the go-to choice for researchers. We find that although PEFT can achieve competitive results on some benchmarks, its performance falls short of FFT in complex tasks, such as reasoning and instruction-based fine-tuning. In this paper, we compare the characteristics of PEFT and FFT in terms of representational capacity and robustness based on optimization theory. We theoretically demonstrate that PEFT is a strict subset of FFT. By providing theoretical upper bounds for PEFT, we show that the limited parameter space constrains the model's representational ability, making it more susceptible to perturbations. Experiments on 15 datasets encompassing classification, generation, reasoning, instruction fine-tuning tasks and 11 adversarial test sets validate our theories. We hope that these results spark further research beyond the realms of well established PEFT. The source code is in the anonymous Github repository\footnote{https://github.com/misonsky/PEFTEval}.

Look Within or Look Beyond? A Theoretical Comparison Between Parameter-Efficient and Full Fine-Tuning

TL;DR

This work theoretically and empirically analyzes the trade-offs between Parameter-Efficient Fine-Tuning (PEFT) and Full Fine-Tuning (FFT). It proves PEFT occupies a strict, low-dimensional subspace of the FFT parameter space and derives an upper bound on its representational capacity, along with a diminishing return on additional parameters, heightened sensitivity to perturbations, and smaller data-driven gains. Across 15 datasets and 11 adversarial test sets, FFT consistently outperforms PEFT on complex tasks and exhibits stronger robustness, while PEFT can match or exceed FFT on simpler, data-scarce scenarios. The findings suggest FFT as the generally more reliable choice when resources permit, with PEFT offering benefits primarily in low-data or resource-constrained settings, and they highlight theoretical avenues for improving PEFT reliability and performance.

Abstract

Parameter-Efficient Fine-Tuning (PEFT) methods achieve performance comparable to Full Fine-Tuning (FFT) while requiring significantly fewer computing resources, making it the go-to choice for researchers. We find that although PEFT can achieve competitive results on some benchmarks, its performance falls short of FFT in complex tasks, such as reasoning and instruction-based fine-tuning. In this paper, we compare the characteristics of PEFT and FFT in terms of representational capacity and robustness based on optimization theory. We theoretically demonstrate that PEFT is a strict subset of FFT. By providing theoretical upper bounds for PEFT, we show that the limited parameter space constrains the model's representational ability, making it more susceptible to perturbations. Experiments on 15 datasets encompassing classification, generation, reasoning, instruction fine-tuning tasks and 11 adversarial test sets validate our theories. We hope that these results spark further research beyond the realms of well established PEFT. The source code is in the anonymous Github repository\footnote{https://github.com/misonsky/PEFTEval}.

Paper Structure

This paper contains 28 sections, 5 theorems, 67 equations, 3 figures, 9 tables.

Key Result

Theorem 1

(Subset PEFT of FFT) According to equation equation:4, we define $\theta_{\Phi} := \theta_0 + g(\Phi) \in \mathbb{R}^d$, where $g$ is a non-surjective function (proof in the proof:1). The conclusions can be drawn: That is to say, $\forall \Phi \in \mathbb{R}^k, \; \exists \theta_{\Phi} \in \mathbb{R}^d \; \text{such that} \; f(x; \theta_0; \Phi) = f(x; \theta_{\Phi}) \subset \mathcal{F}_{full}$

Figures (3)

  • Figure 1: Incremental parameter distributions for different fine-tuning methods. For Prefix, the incremental parameters are prompt embeddings. For LoRA, BitFit, and FFT are the query of the last layer. The same phenomenon can be observed in other layers for LoRA, BitFit, and FFT. The base model is LLaMA2-7B. The task is instruction tuning on Alpaca.
  • Figure 2: The impact of training samples on the performance of different fine-tuning methods. $k$ denotes the number of training examples per class. The score represents the average performance of different fine-tuning methods on all test set. The base model is LLaMA2-7B.
  • Figure 3: Performance trends of FFT and LoRA fine-tuning with different amounts of fine-tuning parameters on AdvGLUE and Adversarial SQuAD.

Theorems & Definitions (9)

  • Theorem 1
  • Theorem 2
  • Theorem 3
  • Theorem 4
  • Theorem 5
  • proof : Proof of \ref{['theorem:1']}
  • proof : Proof of \ref{['theorem:2']}
  • proof : Proof of \ref{['theorem:3']}
  • proof : Proof of \ref{['theorem:4']}