Table of Contents
Fetching ...

Let's Focus on Neuron: Neuron-Level Supervised Fine-tuning for Large Language Model

Haoyun Xu, Runzhe Zhan, Derek F. Wong, Lidia S. Chao

TL;DR

Neuron-Level Fine-Tuning (NeFT) is introduced, a novel approach that refines the granularity of parameter training down to the individual neuron, enabling more precise and computationally efficient model updates.

Abstract

Large Language Models (LLMs) are composed of neurons that exhibit various behaviors and roles, which become increasingly diversified as models scale. Recent studies have revealed that not all neurons are active across different datasets, and this sparsity correlates positively with the task-specific ability, leading to advancements in model pruning and training efficiency. Traditional fine-tuning methods engage all parameters of LLMs, which is computationally expensive and may not be necessary. In contrast, Parameter-Efficient Fine-Tuning (PEFT) approaches aim to minimize the number of trainable parameters, yet they still operate at a relatively macro scale (e.g., layer-level). We introduce Neuron-Level Fine-Tuning (NeFT), a novel approach that refines the granularity of parameter training down to the individual neuron, enabling more precise and computationally efficient model updates. The experimental results show that NeFT not only exceeded the performance of full-parameter fine-tuning and PEFT but also provided insights into the analysis of neurons.

Let's Focus on Neuron: Neuron-Level Supervised Fine-tuning for Large Language Model

TL;DR

Neuron-Level Fine-Tuning (NeFT) is introduced, a novel approach that refines the granularity of parameter training down to the individual neuron, enabling more precise and computationally efficient model updates.

Abstract

Large Language Models (LLMs) are composed of neurons that exhibit various behaviors and roles, which become increasingly diversified as models scale. Recent studies have revealed that not all neurons are active across different datasets, and this sparsity correlates positively with the task-specific ability, leading to advancements in model pruning and training efficiency. Traditional fine-tuning methods engage all parameters of LLMs, which is computationally expensive and may not be necessary. In contrast, Parameter-Efficient Fine-Tuning (PEFT) approaches aim to minimize the number of trainable parameters, yet they still operate at a relatively macro scale (e.g., layer-level). We introduce Neuron-Level Fine-Tuning (NeFT), a novel approach that refines the granularity of parameter training down to the individual neuron, enabling more precise and computationally efficient model updates. The experimental results show that NeFT not only exceeded the performance of full-parameter fine-tuning and PEFT but also provided insights into the analysis of neurons.
Paper Structure (37 sections, 1 equation, 6 figures, 6 tables, 1 algorithm)

This paper contains 37 sections, 1 equation, 6 figures, 6 tables, 1 algorithm.

Figures (6)

  • Figure 1: This diagram shows the whole process of our proposed Neuron-Level Fine-Tuning method. (1) Prepare two models, one is the original model (${{\mathbf{M}_\mathrm{Org}}}$) and the other is the model (${{\mathbf{M}_\mathrm{FT}}}$) trained with full-parameter fine-tuning. (2) Calculate the cosine similarity for each pair of neurons in the corresponding positions of ${{\mathbf{M}_\mathrm{Org}}}$ and ${{\mathbf{M}_\mathrm{FT}}}$ and select the $x\%$ neurons with the lowest score and refer to these neurons as sensitive neurons. (3) Mask the gradients of non-sensitive neurons during SFT training to ensure that only the selected neurons are updated.
  • Figure 2: Comparison of NeFT and LoRA across different trainable parameter settings. NeFT consistently utilizes fewer parameters than LoRA at each level. The details are presented in Appendix Table \ref{['table10']}.
  • Figure 3: Average rank differences between NeFT${_{6\%}}$ and NeFT${_{3\%}}$ were calculated for neurons. The ranks were sorted based on their pairwise Pearson scores in descending order.
  • Figure 4: BLEU scores of models trained with different NeFT settings. By using NeFT${_{3\%}}$ as a base setting, neurons have high similarity scores and those with low similarity scores were separately incorporated and trained using 20k English-Chinese translation data.
  • Figure 5: Rank difference ${\mathrm{Avg}(\Delta\mathbf{Rank}})$ is calculated in order to assess the shifts in the utilization of neurons. Overall, the neuron utilization of original neuron selection strategy NeFT${_{x\%}}$ is more stable than that of contrasting selection strategy NeFT${_{3\%}}$+Reversed${_{x\%}}$.
  • ...and 1 more figures