Table of Contents
Fetching ...

Measuring and Analyzing Intelligence via Contextual Uncertainty in Large Language Models using Information-Theoretic Metrics

Jae Wan Shim

TL;DR

This paper reframes the evaluation of large language models from task performance to information-processing by introducing a cognitive profile built from entropy metrics. It defines two key quantities, the conditional entropy $h_k$ and the marginal entropy $H_k$, and derives the Length-Conditional Uncertainty Index $u_k = h_k/H_k$, plotted as the Entropy Decay Curve (EDC) over context length $k$ to reveal how models exploit context. The Information Gain Span (IGS) provides a single scalar summary of an EDC, and the authors show that the curves vary with model scale and text complexity, with entropy collapse serving as a potential memorisation or data-contamination signal. The framework rests on solid theory, including monotonicity results and Bayesian interpretations, and offers a practical diagnostic toolkit for comparing internal dynamics of LLMs and auditing training data across domains.

Abstract

Large Language Models (LLMs) excel on many task-specific benchmarks, yet the mechanisms that drive this success remain poorly understood. We move from asking what these systems can do to asking how they process information. Our contribution is a task-agnostic method that builds a quantitative Cognitive Profile for any model. The profile is built around the Entropy Decay Curve-a plot of a model's normalised predictive uncertainty as context length grows. Across several state-of-the-art LLMs and diverse texts, the curves expose distinctive, stable profiles that depend on both model scale and text complexity. We also propose the Information Gain Span (IGS) as a single index that summarises the desirability of a decay pattern. Together, these tools offer a principled way to analyse and compare the internal dynamics of modern AI systems.

Measuring and Analyzing Intelligence via Contextual Uncertainty in Large Language Models using Information-Theoretic Metrics

TL;DR

This paper reframes the evaluation of large language models from task performance to information-processing by introducing a cognitive profile built from entropy metrics. It defines two key quantities, the conditional entropy and the marginal entropy , and derives the Length-Conditional Uncertainty Index , plotted as the Entropy Decay Curve (EDC) over context length to reveal how models exploit context. The Information Gain Span (IGS) provides a single scalar summary of an EDC, and the authors show that the curves vary with model scale and text complexity, with entropy collapse serving as a potential memorisation or data-contamination signal. The framework rests on solid theory, including monotonicity results and Bayesian interpretations, and offers a practical diagnostic toolkit for comparing internal dynamics of LLMs and auditing training data across domains.

Abstract

Large Language Models (LLMs) excel on many task-specific benchmarks, yet the mechanisms that drive this success remain poorly understood. We move from asking what these systems can do to asking how they process information. Our contribution is a task-agnostic method that builds a quantitative Cognitive Profile for any model. The profile is built around the Entropy Decay Curve-a plot of a model's normalised predictive uncertainty as context length grows. Across several state-of-the-art LLMs and diverse texts, the curves expose distinctive, stable profiles that depend on both model scale and text complexity. We also propose the Information Gain Span (IGS) as a single index that summarises the desirability of a decay pattern. Together, these tools offer a principled way to analyse and compare the internal dynamics of modern AI systems.

Paper Structure

This paper contains 40 sections, 4 theorems, 23 equations, 3 figures, 4 tables.

Key Result

Proposition 1

Let $p$ be an autoregressive model. Assume the Predictive Attenuation Condition: for every $k \ge 0$, Then $h_{k+1} \le h_k$.

Figures (3)

  • Figure 1: Entropy Decay Curves ($u_k$ vs. $k$) for all three models on the "Alice's Adventures in Wonderland" corpus. The horizontal axis is shown on a logarithmic scale.
  • Figure 2: Entropy Decay Curves ($u_k$ vs. $k$) for all three models on the "Ulysses" corpus. The horizontal axis is shown on a logarithmic scale.
  • Figure 3: Entropy Decay Curves ($u_k$ vs. $k$) for all three models on the "Kant’s Critique of Judgement" corpus. The horizontal axis is shown on a logarithmic scale.

Theorems & Definitions (11)

  • Definition 1: Theoretical Entropy Metrics
  • Proposition 1: Monotonicity of $h_k$ under Predictive Attenuation
  • proof
  • Remark 1: Why $H_k$ Depends on $k$
  • Proposition 2: Monotonicity of $H_k$ under Predictive Convergence
  • proof
  • Remark 2: Two Regimes of Uncertainty Reduction
  • Proposition 3: Bayesian Decomposition of Marginal Entropy
  • proof
  • Theorem 1: IGS Peak and Markov Order
  • ...and 1 more