Table of Contents
Fetching ...

Beyond Confidence: The Rhythms of Reasoning in Generative Models

Deyuan Liu, Zecheng Wang, Zhanyue Qin, Zhiying Tu, Dianhui Chu, Dianbo Sui

TL;DR

The paper tackles the brittleness of large language models to subtle contextual changes by introducing the Token Constraint Bound $δ_{\mathrm{TCB}}$, a local stability metric that bounds internal hidden-state perturbations $\bm{h}$ to prevent changes in the dominant next-token. It grounds $δ_{\mathrm{TCB}}$ in the Jacobian of the softmax output with respect to $\bm{h}$ and, crucially, links it to the geometry of the output embedding space via the exact expression $\|\mathbf{J}_{\mathbf{W}}(\bm{h})\|_F^2 = \sum_{i=1}^{\mathcal{V}} o_i^2 \|\mathbf{w}_i - \boldsymbol{\mu}_{\mathbf{w}}(\bm{h})\|_2^2$. The authors show that $δ_{\mathrm{TCB}}$ correlates with prompt quality, reveals instabilities not captured by perplexity, and serves as a diagnostic tool for robust prompt engineering and in-context learning. By combining theoretical derivations with experiments on LLaMA-3.1-8B across MMLU, GSM8K, and prompt variations, the work demonstrates that stability concepts grounded in output-embedding geometry can guide more reliable, context-aware NLP systems and motivate extensions to broader models and settings.

Abstract

Large Language Models (LLMs) exhibit impressive capabilities yet suffer from sensitivity to slight input context variations, hampering reliability. Conventional metrics like accuracy and perplexity fail to assess local prediction robustness, as normalized output probabilities can obscure the underlying resilience of an LLM's internal state to perturbations. We introduce the Token Constraint Bound ($δ_{\mathrm{TCB}}$), a novel metric that quantifies the maximum internal state perturbation an LLM can withstand before its dominant next-token prediction significantly changes. Intrinsically linked to output embedding space geometry, $δ_{\mathrm{TCB}}$ provides insights into the stability of the model's internal predictive commitment. Our experiments show $δ_{\mathrm{TCB}}$ correlates with effective prompt engineering and uncovers critical prediction instabilities missed by perplexity during in-context learning and text generation. $δ_{\mathrm{TCB}}$ offers a principled, complementary approach to analyze and potentially improve the contextual stability of LLM predictions.

Beyond Confidence: The Rhythms of Reasoning in Generative Models

TL;DR

The paper tackles the brittleness of large language models to subtle contextual changes by introducing the Token Constraint Bound , a local stability metric that bounds internal hidden-state perturbations to prevent changes in the dominant next-token. It grounds in the Jacobian of the softmax output with respect to and, crucially, links it to the geometry of the output embedding space via the exact expression . The authors show that correlates with prompt quality, reveals instabilities not captured by perplexity, and serves as a diagnostic tool for robust prompt engineering and in-context learning. By combining theoretical derivations with experiments on LLaMA-3.1-8B across MMLU, GSM8K, and prompt variations, the work demonstrates that stability concepts grounded in output-embedding geometry can guide more reliable, context-aware NLP systems and motivate extensions to broader models and settings.

Abstract

Large Language Models (LLMs) exhibit impressive capabilities yet suffer from sensitivity to slight input context variations, hampering reliability. Conventional metrics like accuracy and perplexity fail to assess local prediction robustness, as normalized output probabilities can obscure the underlying resilience of an LLM's internal state to perturbations. We introduce the Token Constraint Bound (), a novel metric that quantifies the maximum internal state perturbation an LLM can withstand before its dominant next-token prediction significantly changes. Intrinsically linked to output embedding space geometry, provides insights into the stability of the model's internal predictive commitment. Our experiments show correlates with effective prompt engineering and uncovers critical prediction instabilities missed by perplexity during in-context learning and text generation. offers a principled, complementary approach to analyze and potentially improve the contextual stability of LLM predictions.
Paper Structure (93 sections, 1 theorem, 58 equations, 5 figures, 5 tables)

This paper contains 93 sections, 1 theorem, 58 equations, 5 figures, 5 tables.

Key Result

Proposition 1

For a given output weight matrix $\mathbf{W}$ and hidden state $\bm{h}$, let $\bm{o} = \text{softmax}(\mathbf{W}\bm{h})$ be the output probability vector. The squared Frobenius norm of the output Jacobian $\mathbf{J}_{\mathbf{W}}(\bm{h}) = (\mathop{\mathrm{diag}}\limits(\bm{o}) - \bm{o}\bm{o}^\top) This sum represents the squared Euclidean distances between each embedding $\bm{w}_i$ and the mean

Figures (5)

  • Figure 1: The Token Constraint Bound ($\delta_{\mathrm{TCB}}$) mechanism.$\delta_{\mathrm{TCB}}$ quantifies the maximum perturbation a model's internal state can withstand before the next-token prediction changes. (a) Left panel illustrates how a hidden state perturbation $\Delta \bm{h}$ impacts the next token prediction. Small perturbations ($\Delta \bm{h}_1$, implicitly within $\delta_{\mathrm{TCB}}$ radius) may preserve the output, while larger ones ($\Delta \bm{h}_2 > \delta_{\mathrm{TCB}}$) can flip it (from "No" to "Yes"). $\delta_{\mathrm{TCB}}$ bounds the perturbation size for stable output. (b) Right panel shows that the original hidden state $\bm{h}$ and a perturbed state $\bm{h}'$ inside a stability region predict "No". Another perturbation $\bm{h}"$ outside the region flips the prediction to "Yes", demonstrating the practical consequence of exceeding the stability boundary.
  • Figure 2: $\delta_{\mathrm{TCB}}$ reflects context-induced prediction stability. (a) Illustrates how prompts inducing higher prediction confidence (lower $\mathcal{V}_{\mathrm{eff}}$, state $\bm{h}_1$) lead to a significantly larger $\delta_{\mathrm{TCB}}$ compared to prompts yielding lower confidence (higher $\mathcal{V}_{\mathrm{eff}}$, state $\bm{h}_2$). (b) Shows how In-Context Learning examples modify the hidden state and consequently the prediction and its stability. Adding examples can initially decrease stability while flipping the prediction, but consistent examples can increase stability for the target output.
  • Figure 3: Output Distribution Determines Geometric Stability. The Token Constraint Bound ($\delta_{\mathrm{TCB}}$) is a function of the geometric arrangement of embeddings. (a) High Confidence: Peaked distribution concentrates $\boldsymbol{\mu}_{\bm{w}}(\bm{h})$ near the dominant embedding $\bm{w}_k$, minimizing the sum and maximizing $\delta_{\mathrm{TCB}}$. (b) Uncertainty: Flatter distribution spreads $\boldsymbol{\mu}_{\bm{w}}(\bm{h})$ among active embeddings, increasing the sum and reducing $\delta_{\mathrm{TCB}}$. $\boldsymbol{\mu}_{\bm{w}}(\bm{h})$, token embeddings $\bm{w}_i$.
  • Figure 4: $\delta_{\mathrm{TCB}}$ dynamics vs. $P(\text{2nd best})$ during potentially repetitive generation. Plot shows $\delta_{\mathrm{TCB}}$ (blue, left y-axis) and $P(\text{2nd best})$ (green, right y-axis) versus generation step for LLaMA-3.1--8B. Sharp dips in $\delta_{\mathrm{TCB}}$ (e.g., around steps 5-10, 20-25) often correlate with spikes in $P(\text{2nd best})$, indicating transient local instability not captured by average sequence PPL. Later, high, stable $\delta_{\mathrm{TCB}}$ (e.g., steps 30+) can characterize a degenerate loop, showing robust commitment to the repetitive pattern.
  • Figure 5: Relationship between Prefix Semantics, TCB, and Model Internals. Visual analysis based on 16 semantically varied prefixes targeting the same continuation. (a) $\delta_{\mathrm{TCB}}$ versus semantic distance shows a strong positive linear correlation ($R^2=0.91$). Point colors map to semantic distance (see colorbar), and annotations identify prefixes. The empirical fit (gray dashed line) closely matches the data, aligning well with the theoretical prediction (black dashed line, Eq. \ref{['eq:linfit']}). (b) $\delta_{\mathrm{TCB}}$ versus the Frobenius norm of the final layer Jacobian ($\|J_{\mathbf{W}}\|_F$) exhibits a moderate negative correlation. (c) $\delta_{\mathrm{TCB}}$ versus the probability of the first token in the continuation shows a less distinct relationship for this set of prefixes.

Theorems & Definitions (2)

  • Definition 1: Token Constraint Bound delta_TCB
  • Proposition 1: Exact Squared Jacobian Norm \ref{['app:cov_deriv']}