Table of Contents
Fetching ...

Learning Dynamics of LLM Finetuning

Yi Ren, Danica J. Sutherland

TL;DR

The paper develops a unified learning-dynamics framework to analyze how finetuning updates in LLMs influence predictions on other prompts.It derives a stepwise NTK-based decomposition and demonstrates accumulation on MNIST and LLM finetuning across SFT and DPO, linking training signals to input-space influence.It identifies a squeezing effect caused by large negative gradients and shows how this mechanism explains differences between on-policy and off-policy methods, including hallucinations and repetition phenomena.It also proposes a simple data-augmentation strategy that mitigates squeezing and improves alignment after DPO, offering a practical approach to enhance RL-free alignment methods.

Abstract

Learning dynamics, which describes how the learning of specific training examples influences the model's predictions on other examples, gives us a powerful tool for understanding the behavior of deep learning systems. We study the learning dynamics of large language models during different types of finetuning, by analyzing the step-wise decomposition of how influence accumulates among different potential responses. Our framework allows a uniform interpretation of many interesting observations about the training of popular algorithms for both instruction tuning and preference tuning. In particular, we propose a hypothetical explanation of why specific types of hallucination are strengthened after finetuning, e.g., the model might use phrases or facts in the response for question B to answer question A, or the model might keep repeating similar simple phrases when generating responses. We also extend our framework and highlight a unique "squeezing effect" to explain a previously observed phenomenon in off-policy direct preference optimization (DPO), where running DPO for too long makes even the desired outputs less likely. This framework also provides insights into where the benefits of on-policy DPO and other variants come from. The analysis not only provides a novel perspective of understanding LLM's finetuning but also inspires a simple, effective method to improve alignment performance.

Learning Dynamics of LLM Finetuning

TL;DR

The paper develops a unified learning-dynamics framework to analyze how finetuning updates in LLMs influence predictions on other prompts.It derives a stepwise NTK-based decomposition and demonstrates accumulation on MNIST and LLM finetuning across SFT and DPO, linking training signals to input-space influence.It identifies a squeezing effect caused by large negative gradients and shows how this mechanism explains differences between on-policy and off-policy methods, including hallucinations and repetition phenomena.It also proposes a simple data-augmentation strategy that mitigates squeezing and improves alignment after DPO, offering a practical approach to enhance RL-free alignment methods.

Abstract

Learning dynamics, which describes how the learning of specific training examples influences the model's predictions on other examples, gives us a powerful tool for understanding the behavior of deep learning systems. We study the learning dynamics of large language models during different types of finetuning, by analyzing the step-wise decomposition of how influence accumulates among different potential responses. Our framework allows a uniform interpretation of many interesting observations about the training of popular algorithms for both instruction tuning and preference tuning. In particular, we propose a hypothetical explanation of why specific types of hallucination are strengthened after finetuning, e.g., the model might use phrases or facts in the response for question B to answer question A, or the model might keep repeating similar simple phrases when generating responses. We also extend our framework and highlight a unique "squeezing effect" to explain a previously observed phenomenon in off-policy direct preference optimization (DPO), where running DPO for too long makes even the desired outputs less likely. This framework also provides insights into where the benefits of on-policy DPO and other variants come from. The analysis not only provides a novel perspective of understanding LLM's finetuning but also inspires a simple, effective method to improve alignment performance.
Paper Structure (43 sections, 3 theorems, 44 equations, 23 figures, 1 table)

This paper contains 43 sections, 3 theorems, 44 equations, 23 figures, 1 table.

Key Result

Proposition 1

Let $\pi=\mathop{\mathrm{\mathsf{Softmax}}}\nolimits(\bm{\mathsf{z}})$ and $\bm{\mathsf{z}}=h_\theta(\bm{\mathsf{x}})$. The one-step learning dynamics decompose as where $\mathcal{A}^t(\textcolor{orange}{\bm{\mathsf{x}}_o}) = \nabla_{\bm{\mathsf{z}}}\log\pi_{\theta^t}(\textcolor{orange}{\bm{\mathsf{x}}_o}) = I - \bm{\mathsf{1}} \pi_{\theta^t}^\top(\textcolor{orange}{\bm{\mathsf{x}}_o})$, $\mathca

Figures (23)

  • Figure 1: The per-step learning dynamics and the accumulated influence in an MNIST experiment.
  • Figure 2: The updating vector provided by the residual term $\mathcal{G}^t$ of different algorithms. The gray $\bm{\mathsf{y}}$ are responses sampled from $\pi$ in an on-policy way. In the second panel, we demonstrate the "squeezing effect" caused by imposing a big negative gradient on a "valley" region of a distribution. For more details about this counter-intuitive effect, please refer to \ref{['sec:LD_preference:squeezing_effect']} and \ref{['app: squeeze_effect']}. Other panels demonstrate on-policy DPO (and IPO), SPIN chen2024self, SPPO wu2024self, and SLiC zhao2023slic.
  • Figure 3: First three: learning dynamics of SFT on different response types. Fourth: SFT 10 epochs then DPO. Last: the accumulated influence when SFT using different $\bm{\mathsf{y}}$ (full results in \ref{['app: eNTK_stable']} and \ref{['app: 2D_plane']}).
  • Figure 4: Learning dynamics of off-policy DPO. The last panel verifies the existence of the squeezing effect.
  • Figure 5: Learning dynamics of the baseline and the proposed method with training data extension. Key trends to observe: 1.) Baseline and the extend method have similar behavior on $\bm{\mathsf{y}}^+_u$ during SFT; 2.) The extend method considerably increase $\bm{\mathsf{y}}^-_u$ during SFT; 3.) The squeezing effect of the extend method is weaker (all other responses decay slower and the confidence on the "greedy-decoding" response increases slower).
  • ...and 18 more figures

Theorems & Definitions (5)

  • Proposition 1
  • Proposition 1
  • proof
  • Lemma 1
  • proof