Analyzing the Effects of Supervised Fine-Tuning on Model Knowledge from Token and Parameter Levels
Junjie Ye, Yuming Yang, Yang Nan, Shuo Li, Qi Zhang, Tao Gui, Xuanjing Huang, Peng Wang, Zhongchao Shi, Jianping Fan
TL;DR
This work investigates how supervised fine-tuning (SFT) reshapes a model's internal knowledge, focusing on CBQA performance across five LLaMA-family models. It introduces a dual analysis: token-level shifts measured by KL divergence and parameter-level changes via selective restoration of highly updated parameters, revealing that SFT often makes unnecessary updates and can degrade knowledge when data scale or mastery is mismatched. The study finds that the optimum CBQA performance occurs at small data sizes (around 240 samples) and that performance fluctuations align with mastery-level categories, with restoration of up to $90 ext{%}$ of updated parameters yielding notable gains in several settings. These results offer practical guidance for designing more efficient fine-tuning strategies that preserve and strengthen pre-trained knowledge while avoiding detrimental updates. The work highlights the potential of parameter restoration as a simple, effective tool to enhance knowledge retention during SFT and points to adaptive data-selection methods as a fruitful direction for future research.
Abstract
Large language models (LLMs) acquire substantial world knowledge during pre-training, which is further shaped by post-training techniques such as supervised fine-tuning (SFT). However, the impact of SFT on a model's knowledge remains underexplored, limiting our ability to control knowledge change behavior in fine-tuned models. To address this gap, we evaluate closed-book question answering (CBQA) performance across five LLMs from the LLaMA-2 and LLaMA-3 families. Surprisingly, models fine-tuned on 1,920 samples perform up to 14% worse than those fine-tuned on only 240 samples. Furthermore, varying the level of knowledge mastery in the fine-tuning data leads to performance fluctuations of over 12%. To investigate these effects, we analyze model behavior at both the token and parameter levels. Our analysis reveals that up to 90% of parameter updates during SFT do not contribute to knowledge enhancement. Restoring these updates can improve performance on the CBQA task, depending on the characteristics of the fine-tuning data. These insights offer practical guidance for developing fine-tuning strategies that more effectively strengthen model knowledge.
