Table of Contents
Fetching ...

Analyzing the Effects of Supervised Fine-Tuning on Model Knowledge from Token and Parameter Levels

Junjie Ye, Yuming Yang, Yang Nan, Shuo Li, Qi Zhang, Tao Gui, Xuanjing Huang, Peng Wang, Zhongchao Shi, Jianping Fan

TL;DR

This work investigates how supervised fine-tuning (SFT) reshapes a model's internal knowledge, focusing on CBQA performance across five LLaMA-family models. It introduces a dual analysis: token-level shifts measured by KL divergence and parameter-level changes via selective restoration of highly updated parameters, revealing that SFT often makes unnecessary updates and can degrade knowledge when data scale or mastery is mismatched. The study finds that the optimum CBQA performance occurs at small data sizes (around 240 samples) and that performance fluctuations align with mastery-level categories, with restoration of up to $90 ext{%}$ of updated parameters yielding notable gains in several settings. These results offer practical guidance for designing more efficient fine-tuning strategies that preserve and strengthen pre-trained knowledge while avoiding detrimental updates. The work highlights the potential of parameter restoration as a simple, effective tool to enhance knowledge retention during SFT and points to adaptive data-selection methods as a fruitful direction for future research.

Abstract

Large language models (LLMs) acquire substantial world knowledge during pre-training, which is further shaped by post-training techniques such as supervised fine-tuning (SFT). However, the impact of SFT on a model's knowledge remains underexplored, limiting our ability to control knowledge change behavior in fine-tuned models. To address this gap, we evaluate closed-book question answering (CBQA) performance across five LLMs from the LLaMA-2 and LLaMA-3 families. Surprisingly, models fine-tuned on 1,920 samples perform up to 14% worse than those fine-tuned on only 240 samples. Furthermore, varying the level of knowledge mastery in the fine-tuning data leads to performance fluctuations of over 12%. To investigate these effects, we analyze model behavior at both the token and parameter levels. Our analysis reveals that up to 90% of parameter updates during SFT do not contribute to knowledge enhancement. Restoring these updates can improve performance on the CBQA task, depending on the characteristics of the fine-tuning data. These insights offer practical guidance for developing fine-tuning strategies that more effectively strengthen model knowledge.

Analyzing the Effects of Supervised Fine-Tuning on Model Knowledge from Token and Parameter Levels

TL;DR

This work investigates how supervised fine-tuning (SFT) reshapes a model's internal knowledge, focusing on CBQA performance across five LLaMA-family models. It introduces a dual analysis: token-level shifts measured by KL divergence and parameter-level changes via selective restoration of highly updated parameters, revealing that SFT often makes unnecessary updates and can degrade knowledge when data scale or mastery is mismatched. The study finds that the optimum CBQA performance occurs at small data sizes (around 240 samples) and that performance fluctuations align with mastery-level categories, with restoration of up to of updated parameters yielding notable gains in several settings. These results offer practical guidance for designing more efficient fine-tuning strategies that preserve and strengthen pre-trained knowledge while avoiding detrimental updates. The work highlights the potential of parameter restoration as a simple, effective tool to enhance knowledge retention during SFT and points to adaptive data-selection methods as a fruitful direction for future research.

Abstract

Large language models (LLMs) acquire substantial world knowledge during pre-training, which is further shaped by post-training techniques such as supervised fine-tuning (SFT). However, the impact of SFT on a model's knowledge remains underexplored, limiting our ability to control knowledge change behavior in fine-tuned models. To address this gap, we evaluate closed-book question answering (CBQA) performance across five LLMs from the LLaMA-2 and LLaMA-3 families. Surprisingly, models fine-tuned on 1,920 samples perform up to 14% worse than those fine-tuned on only 240 samples. Furthermore, varying the level of knowledge mastery in the fine-tuning data leads to performance fluctuations of over 12%. To investigate these effects, we analyze model behavior at both the token and parameter levels. Our analysis reveals that up to 90% of parameter updates during SFT do not contribute to knowledge enhancement. Restoring these updates can improve performance on the CBQA task, depending on the characteristics of the fine-tuning data. These insights offer practical guidance for developing fine-tuning strategies that more effectively strengthen model knowledge.

Paper Structure

This paper contains 46 sections, 8 equations, 6 figures, 36 tables.

Figures (6)

  • Figure 1: Illustration of parameter restoration. We find that SFT introduces many unnecessary parameter updates, and model performance can be significantly improved by restoring some of the most updated parameters in the fine-tuned model to their original values in the pre-trained model.
  • Figure 2: In-domain ($\textbf{Acc}_{test}^\mathcal{M}$) and out-of-domain ($\textbf{Acc}_{testood}^\mathcal{M}$) performance of the LLaMA-3 family models fine-tuned with varying data scales, where 'All' indicates the use of the entire dataset listed in Appendix \ref{['sec:detail_distribution']}.
  • Figure 3: Illustration of logits re-normalization. Since the pre-trained LLM tends to assign high probabilities to common dummy words, we identify the ten highest logits in the fine-tuned LLM and extract the corresponding values from the pre-trained LLM. After re-normalization, we compute the KL divergence to quantify the distributional difference.
  • Figure 4: Performance on $\mathcal{D}_{test-4}^\mathcal{M}$ ($\textbf{Acc}_{test-4}^\mathcal{M}$) of LLMs fine-tuned on LLaMA-3-8B.
  • Figure 5: KL divergence of logits distribution between LLaMA-3-8B fine-tuned with different datasets and the pre-trained one.
  • ...and 1 more figures