Table of Contents
Fetching ...

LLM Probability Concentration: How Alignment Shrinks the Generative Horizon

Chenghao Yang, Ari Holtzman

TL;DR

This work introduces Branching Factor (BF) as a principled, entropy-based measure of the diversity-safe generation breadth in autoregressive LLMs. BF is defined as $B(x; \theta) = \exp(\bar{H}(Y_{1:N} | x; \theta))$, linking local token entropy to a global view of the plausible continuation space, with long-sequence BF estimated via an AEP-based estimator. Empirically, alignment tuning (e.g., RLHF) reduces BF by nearly an order of magnitude (from ~12 to ~1.2), explaining diminished output variability and decoding-sensitivity, while Chain-of-Thought prompts push reasoning into later, lower-BF regions, stabilizing outputs. Nudging experiments further show that alignment may surface latent low-entropy trajectories in base models, implying that alignment does not fundamentally restructure generation but steers it toward stylistic tokens that access constrained, high-probability pathways. Overall, BF provides a unifying diagnostic for understanding, diagnosing, and guiding the effects of alignment on LLM generation and prompts a shift toward training-time strategies to preserve diversity without sacrificing safety or usefulness.

Abstract

Despite their impressive capabilities, aligned large language models (LLMs) often generate outputs that lack diversity. What drives this consistency in the generation? We investigate this phenomenon through the lens of probability concentration in the model's output distribution. To quantify this concentration, we introduce the *Branching Factor* (BF)--a token-invariant measure of the effective number of plausible next steps during generation. Our empirical analysis reveals two key findings: (1) BF often decreases as generation progresses, suggesting that LLMs become more predictable as they generate. (2) alignment tuning substantially sharpens the model's output distribution from the outset, reducing BF by nearly an order of magnitude (e.g., from 12 to 1.2) relative to base models. This stark reduction helps explain why aligned models often appear less sensitive to decoding strategies. Building on this insight, we find this consistency has surprising implications for complex reasoning. Aligned Chain-of-Thought (CoT) models (e.g., DeepSeek-distilled models), for instance, leverage this effect; by generating longer reasoning chains, they push generation into later, more deterministic (lower BF) stages, resulting in more stable outputs. We hypothesize that alignment tuning does not fundamentally change a model's behavior, but instead steers it toward stylistic tokens (e.g., ``Sure'') that unlock low-entropy trajectories already present in the base model. This view is supported by nudging experiments, which show prompting base models with such tokens can similarly reduce BF. Together, our findings establish BF as a powerful diagnostic for understanding and controlling LLM outputs - clarifying how alignment reduces variability, how CoT promotes stable generations, and how base models can be steered away from diversity.

LLM Probability Concentration: How Alignment Shrinks the Generative Horizon

TL;DR

This work introduces Branching Factor (BF) as a principled, entropy-based measure of the diversity-safe generation breadth in autoregressive LLMs. BF is defined as , linking local token entropy to a global view of the plausible continuation space, with long-sequence BF estimated via an AEP-based estimator. Empirically, alignment tuning (e.g., RLHF) reduces BF by nearly an order of magnitude (from ~12 to ~1.2), explaining diminished output variability and decoding-sensitivity, while Chain-of-Thought prompts push reasoning into later, lower-BF regions, stabilizing outputs. Nudging experiments further show that alignment may surface latent low-entropy trajectories in base models, implying that alignment does not fundamentally restructure generation but steers it toward stylistic tokens that access constrained, high-probability pathways. Overall, BF provides a unifying diagnostic for understanding, diagnosing, and guiding the effects of alignment on LLM generation and prompts a shift toward training-time strategies to preserve diversity without sacrificing safety or usefulness.

Abstract

Despite their impressive capabilities, aligned large language models (LLMs) often generate outputs that lack diversity. What drives this consistency in the generation? We investigate this phenomenon through the lens of probability concentration in the model's output distribution. To quantify this concentration, we introduce the *Branching Factor* (BF)--a token-invariant measure of the effective number of plausible next steps during generation. Our empirical analysis reveals two key findings: (1) BF often decreases as generation progresses, suggesting that LLMs become more predictable as they generate. (2) alignment tuning substantially sharpens the model's output distribution from the outset, reducing BF by nearly an order of magnitude (e.g., from 12 to 1.2) relative to base models. This stark reduction helps explain why aligned models often appear less sensitive to decoding strategies. Building on this insight, we find this consistency has surprising implications for complex reasoning. Aligned Chain-of-Thought (CoT) models (e.g., DeepSeek-distilled models), for instance, leverage this effect; by generating longer reasoning chains, they push generation into later, more deterministic (lower BF) stages, resulting in more stable outputs. We hypothesize that alignment tuning does not fundamentally change a model's behavior, but instead steers it toward stylistic tokens (e.g., ``Sure'') that unlock low-entropy trajectories already present in the base model. This view is supported by nudging experiments, which show prompting base models with such tokens can similarly reduce BF. Together, our findings establish BF as a powerful diagnostic for understanding and controlling LLM outputs - clarifying how alignment reduces variability, how CoT promotes stable generations, and how base models can be steered away from diversity.

Paper Structure

This paper contains 32 sections, 1 theorem, 12 equations, 17 figures, 2 tables.

Key Result

Theorem 4.1

Given $0 < \epsilon < 1$, we have:

Figures (17)

  • Figure 1: (a): LLM probability concentration connects and explains several disparate yet critical phenomena in aligned LLMs. (b): A conceptual illustration of how alignment and CoT influence the generation space of LLMs. While base models begin with high output diversity, alignment tuning sharply concentrates early probability mass, leading to more stable outputs. CoT extends this effect into later positions, flattening output sample variation and reducing sensitivity to decoding.
  • Figure 2: AEP Empirical verification for Llama-3-8B-Instruct. (a, b): length-averaged NLL closely tracks length-averaged Entropy. (c, d): Standard deviation of length-averaged NLL diminishes with output length.
  • Figure 3: Shrinking BF with output length over various tasks for Llama-3-70B and Llama-3-70B-Instruct. For better visualization, we compute the exponential moving averaged values of BF with the smoothing factor set as $0.1$.
  • Figure 4: Pareto Analysis of BF across various IFs.$AT$ indicates whether the model is aligned. $C$ denotes the prompt complexity. $S$ refers to model size, and $G$ refers to model generation (Llama-2 vs. Llama-3). Across all settings, alignment tuning has the most pronounced impact on BF.
  • Figure 5: Resampling from different output positions to assess the effect of interrupting BF reduction. We resample new continuations at the 25th and 200th output token of DeepSeek-Distilled Llama-8B MMLU outputs. Results show substantial performance drops at both positions.
  • ...and 12 more figures

Theorems & Definitions (1)

  • Theorem 4.1: AEP for LLMs