Beyond Confidence: The Rhythms of Reasoning in Generative Models
Deyuan Liu, Zecheng Wang, Zhanyue Qin, Zhiying Tu, Dianhui Chu, Dianbo Sui
TL;DR
The paper tackles the brittleness of large language models to subtle contextual changes by introducing the Token Constraint Bound $δ_{\mathrm{TCB}}$, a local stability metric that bounds internal hidden-state perturbations $\bm{h}$ to prevent changes in the dominant next-token. It grounds $δ_{\mathrm{TCB}}$ in the Jacobian of the softmax output with respect to $\bm{h}$ and, crucially, links it to the geometry of the output embedding space via the exact expression $\|\mathbf{J}_{\mathbf{W}}(\bm{h})\|_F^2 = \sum_{i=1}^{\mathcal{V}} o_i^2 \|\mathbf{w}_i - \boldsymbol{\mu}_{\mathbf{w}}(\bm{h})\|_2^2$. The authors show that $δ_{\mathrm{TCB}}$ correlates with prompt quality, reveals instabilities not captured by perplexity, and serves as a diagnostic tool for robust prompt engineering and in-context learning. By combining theoretical derivations with experiments on LLaMA-3.1-8B across MMLU, GSM8K, and prompt variations, the work demonstrates that stability concepts grounded in output-embedding geometry can guide more reliable, context-aware NLP systems and motivate extensions to broader models and settings.
Abstract
Large Language Models (LLMs) exhibit impressive capabilities yet suffer from sensitivity to slight input context variations, hampering reliability. Conventional metrics like accuracy and perplexity fail to assess local prediction robustness, as normalized output probabilities can obscure the underlying resilience of an LLM's internal state to perturbations. We introduce the Token Constraint Bound ($δ_{\mathrm{TCB}}$), a novel metric that quantifies the maximum internal state perturbation an LLM can withstand before its dominant next-token prediction significantly changes. Intrinsically linked to output embedding space geometry, $δ_{\mathrm{TCB}}$ provides insights into the stability of the model's internal predictive commitment. Our experiments show $δ_{\mathrm{TCB}}$ correlates with effective prompt engineering and uncovers critical prediction instabilities missed by perplexity during in-context learning and text generation. $δ_{\mathrm{TCB}}$ offers a principled, complementary approach to analyze and potentially improve the contextual stability of LLM predictions.
