Table of Contents
Fetching ...

Unveiling the Basin-Like Loss Landscape in Large Language Models

Huanran Chen, Yinpeng Dong, Zeming Wei, Yao Huang, Yichi Zhang, Hang Su, Jun Zhu

TL;DR

This work argues that benign fine-tuning confined to the basin should preserve prior capabilities, and provides a theoretical analysis demonstrating that the basin size bounds the performance degradation of any fine-tuning, including the adversarial ones, while also guaranteeing the model robustness w.r.t. input perturbations.

Abstract

We discover the emergence of \textit{basins} in the loss landscape of large language models. As model scale increases, LLMs become progressively more resilient to random perturbations in the parameter space, giving rise to expansive stability regions where models exhibit nearly identical performance, but outside of which their capabilities collapse. We observe that pre-training creates a \textit{basic capability} basin, and subsequent alignment fine-tuning forms \textit{specific capability} basins (e.g., safety, math, coding). Thus, we argue that benign fine-tuning confined to the basin should preserve prior capabilities. Besides, we also analyze the loss landscape for worst-case directions, which is consistently sharp and detrimental. We find that adversarial fine-tuning moves along the nearly worst-case directions, thus rapidly degrading model capabilities. Finally, we provide a theoretical analysis demonstrating that the basin size bounds the performance degradation of any fine-tuning, including the adversarial ones, while also guaranteeing the model robustness w.r.t. input perturbations, suggesting the benefit of enlarging basins.

Unveiling the Basin-Like Loss Landscape in Large Language Models

TL;DR

This work argues that benign fine-tuning confined to the basin should preserve prior capabilities, and provides a theoretical analysis demonstrating that the basin size bounds the performance degradation of any fine-tuning, including the adversarial ones, while also guaranteeing the model robustness w.r.t. input perturbations.

Abstract

We discover the emergence of \textit{basins} in the loss landscape of large language models. As model scale increases, LLMs become progressively more resilient to random perturbations in the parameter space, giving rise to expansive stability regions where models exhibit nearly identical performance, but outside of which their capabilities collapse. We observe that pre-training creates a \textit{basic capability} basin, and subsequent alignment fine-tuning forms \textit{specific capability} basins (e.g., safety, math, coding). Thus, we argue that benign fine-tuning confined to the basin should preserve prior capabilities. Besides, we also analyze the loss landscape for worst-case directions, which is consistently sharp and detrimental. We find that adversarial fine-tuning moves along the nearly worst-case directions, thus rapidly degrading model capabilities. Finally, we provide a theoretical analysis demonstrating that the basin size bounds the performance degradation of any fine-tuning, including the adversarial ones, while also guaranteeing the model robustness w.r.t. input perturbations, suggesting the benefit of enlarging basins.

Paper Structure

This paper contains 33 sections, 7 theorems, 15 equations, 8 figures, 3 tables, 2 algorithms.

Key Result

Theorem 4.2

(Weak Law of Randomized Smoothing salman2019provably) For any benchmark $J_{\mathcal{D}}: \mathbb{R}^d \to [0,1]$, the function $\mathbb{E}_{\bm{\epsilon} \sim \mathcal{N}(\bm{0}, \sigma^2\bm{I})}[J_{f,\mathcal{D}}(\bm{\theta}+\bm{\epsilon})]$ is at most $\frac{1}{\sqrt{2\pi}\sigma}$-Lipschitz. Thus

Figures (8)

  • Figure 1: The most-case loss landscape of different models. Specific benchmarks and visualization details are provided in \ref{['sec:landscape:avg']}. As shown, the loss landscape of LLMs resembles a basin, within which models perform nearly identically and outside of which they lose all capabilities.
  • Figure 2: The worst-case loss landscape of different models. Specific benchmarks and visualization details are provided in \ref{['sec:landscape:worst']}. As shown, moving even a small distance along the worst-case direction rapidly degrades all capabilities of LLMs. Due to all curves reaching the maximum loss at the smallest scale, they completely overlap.
  • Figure 3: The SFT-case loss landscapes for three different datasets using Qwen2.5-7B.
  • Figure 4: Lower bound guarantees. (a) The lower bound on the benchmark value of the smoothed fine-tuned model $J_{f,\mathcal{D}}(\bm{\theta}_{sft} + \bm{\epsilon})$ for varying benchmark values on the smoothed original model $\bm{\theta}_0$, i.e., $p_A := \mathbb{E}_{\bm{\epsilon} \sim \mathcal{N}(\bm{0}, \sigma^2\bm{I})}[J_{f,\mathcal{D}}(\bm{\theta}_0+\bm{\epsilon})]$, with $\sigma = 0.003$. (b) The lower bound on the benchmark value of the smoothed fine-tuned model $J_{f,\mathcal{D}}(\bm{\theta}_{sft} + \bm{\epsilon})$ for varying basin sizes $\sigma$, with $p_A = 0.9$. (c) Histogram of L2 distances between token embeddings.
  • Figure 5: Pre-training and fine-tuning using GO and Adam optimizers. (a) The pre-training loss curve on OpenWebText. (b) The loss landscape on OpenWebText after pre-training. (c) Performance changes in old capability (OpenWebText) and new capability (Alpaca).
  • ...and 3 more figures

Theorems & Definitions (8)

  • Definition 4.1
  • Theorem 4.2
  • Theorem 4.3
  • Theorem 4.4
  • Theorem 4.5
  • Lemma 4.6
  • Lemma 4.7
  • Lemma E.1