Learning Dynamics of LLM Finetuning
Yi Ren, Danica J. Sutherland
TL;DR
The paper develops a unified learning-dynamics framework to analyze how finetuning updates in LLMs influence predictions on other prompts.It derives a stepwise NTK-based decomposition and demonstrates accumulation on MNIST and LLM finetuning across SFT and DPO, linking training signals to input-space influence.It identifies a squeezing effect caused by large negative gradients and shows how this mechanism explains differences between on-policy and off-policy methods, including hallucinations and repetition phenomena.It also proposes a simple data-augmentation strategy that mitigates squeezing and improves alignment after DPO, offering a practical approach to enhance RL-free alignment methods.
Abstract
Learning dynamics, which describes how the learning of specific training examples influences the model's predictions on other examples, gives us a powerful tool for understanding the behavior of deep learning systems. We study the learning dynamics of large language models during different types of finetuning, by analyzing the step-wise decomposition of how influence accumulates among different potential responses. Our framework allows a uniform interpretation of many interesting observations about the training of popular algorithms for both instruction tuning and preference tuning. In particular, we propose a hypothetical explanation of why specific types of hallucination are strengthened after finetuning, e.g., the model might use phrases or facts in the response for question B to answer question A, or the model might keep repeating similar simple phrases when generating responses. We also extend our framework and highlight a unique "squeezing effect" to explain a previously observed phenomenon in off-policy direct preference optimization (DPO), where running DPO for too long makes even the desired outputs less likely. This framework also provides insights into where the benefits of on-policy DPO and other variants come from. The analysis not only provides a novel perspective of understanding LLM's finetuning but also inspires a simple, effective method to improve alignment performance.
