Table of Contents
Fetching ...

GIFT-SW: Gaussian noise Injected Fine-Tuning of Salient Weights for LLMs

Maxim Zhelnin, Viktor Moskvoretskii, Egor Shvetsov, Egor Venediktov, Mariya Krylova, Aleksandr Zuev, Evgeny Burnaev

TL;DR

GIFT-SW tackles the efficiency gap in fine-tuning large language models by updating a small set of salient columns while injecting Gaussian noise into non-salient weights, effectively regularizing training under a fixed compute budget. It unifies saliency metrics into a general perturbation-based criterion $s_j = \big\| \mathbf{D}_j \big\|_{\tau} \big\| \mathbf{X}_j \big\|_{\rho}^{\gamma}$ and selects 128 salient columns, enabling robust fine-tuning with noise injected in non-salient columns and quantization-aware training. Empirically, GIFT-SW outperforms full fine-tuning and modern PEFT methods across most zero-shot tasks on LLaMA models, including competitive results with TÜLU2 under reduced compute, and remains more stable across data budgets. The work validates the utility of combining structured saliency with quantization noise as a practical PEFT strategy and highlights areas for further refinement in saliency criteria and quantization settings.

Abstract

Parameter Efficient Fine-Tuning (PEFT) methods have gained popularity and democratized the usage of Large Language Models (LLMs). Recent studies have shown that a small subset of weights significantly impacts performance. Based on this observation, we introduce a novel PEFT method, called Gaussian noise Injected Fine Tuning of Salient Weights (GIFT-SW). Our method updates only salient columns, while injecting Gaussian noise into non-salient ones. To identify these columns, we developeda generalized sensitivity metric that extends and unifies metrics from previous studies. Experiments with LLaMA models demonstrate that GIFT-SW outperforms full fine-tuning and modern PEFT methods under the same computational budget. Moreover, GIFT-SW offers practical advantages to recover performance of models subjected to mixed-precision quantization with keeping salient weights in full precision.

GIFT-SW: Gaussian noise Injected Fine-Tuning of Salient Weights for LLMs

TL;DR

GIFT-SW tackles the efficiency gap in fine-tuning large language models by updating a small set of salient columns while injecting Gaussian noise into non-salient weights, effectively regularizing training under a fixed compute budget. It unifies saliency metrics into a general perturbation-based criterion and selects 128 salient columns, enabling robust fine-tuning with noise injected in non-salient columns and quantization-aware training. Empirically, GIFT-SW outperforms full fine-tuning and modern PEFT methods across most zero-shot tasks on LLaMA models, including competitive results with TÜLU2 under reduced compute, and remains more stable across data budgets. The work validates the utility of combining structured saliency with quantization noise as a practical PEFT strategy and highlights areas for further refinement in saliency criteria and quantization settings.

Abstract

Parameter Efficient Fine-Tuning (PEFT) methods have gained popularity and democratized the usage of Large Language Models (LLMs). Recent studies have shown that a small subset of weights significantly impacts performance. Based on this observation, we introduce a novel PEFT method, called Gaussian noise Injected Fine Tuning of Salient Weights (GIFT-SW). Our method updates only salient columns, while injecting Gaussian noise into non-salient ones. To identify these columns, we developeda generalized sensitivity metric that extends and unifies metrics from previous studies. Experiments with LLaMA models demonstrate that GIFT-SW outperforms full fine-tuning and modern PEFT methods under the same computational budget. Moreover, GIFT-SW offers practical advantages to recover performance of models subjected to mixed-precision quantization with keeping salient weights in full precision.
Paper Structure (34 sections, 6 equations, 4 figures, 9 tables)

This paper contains 34 sections, 6 equations, 4 figures, 9 tables.

Figures (4)

  • Figure 1: Mean performance of different fine-tuning approaches for LLaMA models with scaling data budget. GIFT-SW shows superior performance with nearly all data budgets, also being as stable as full fine-tuning.
  • Figure 2: GIFT-SW procedure follows Equation \ref{['eq:pgd_two']}. We first sample some noise, relative to quantization levels, then, perform forward pass, and then update salient weights only. In GIFT-SW, quantization, pruning or tensor decomposition can be applied to non-salient weights and then, salient weights can be fine-tuned effectively without changing non-salient weights structure. In our experiments we select only 128 columns of salient weights, unless specified otherwise.
  • Figure 3: Uniform quantization step function with real valued one dimensional $w$ and integer valued $Q(w)$.
  • Figure 4: Number of examples in datasets included in TÜLU-V2-mix subset