Measuring and Analyzing Intelligence via Contextual Uncertainty in Large Language Models using Information-Theoretic Metrics
Jae Wan Shim
TL;DR
This paper reframes the evaluation of large language models from task performance to information-processing by introducing a cognitive profile built from entropy metrics. It defines two key quantities, the conditional entropy $h_k$ and the marginal entropy $H_k$, and derives the Length-Conditional Uncertainty Index $u_k = h_k/H_k$, plotted as the Entropy Decay Curve (EDC) over context length $k$ to reveal how models exploit context. The Information Gain Span (IGS) provides a single scalar summary of an EDC, and the authors show that the curves vary with model scale and text complexity, with entropy collapse serving as a potential memorisation or data-contamination signal. The framework rests on solid theory, including monotonicity results and Bayesian interpretations, and offers a practical diagnostic toolkit for comparing internal dynamics of LLMs and auditing training data across domains.
Abstract
Large Language Models (LLMs) excel on many task-specific benchmarks, yet the mechanisms that drive this success remain poorly understood. We move from asking what these systems can do to asking how they process information. Our contribution is a task-agnostic method that builds a quantitative Cognitive Profile for any model. The profile is built around the Entropy Decay Curve-a plot of a model's normalised predictive uncertainty as context length grows. Across several state-of-the-art LLMs and diverse texts, the curves expose distinctive, stable profiles that depend on both model scale and text complexity. We also propose the Information Gain Span (IGS) as a single index that summarises the desirability of a decay pattern. Together, these tools offer a principled way to analyse and compare the internal dynamics of modern AI systems.
