Table of Contents
Fetching ...

Top-b: Entropic Regulation of Relative Probability Bands in Autoregressive Language Processes

Deepon Halder, Raj Dabre

Abstract

Probabilistic language generators are theoretically modeled as discrete stochastic processes, yet standard decoding strategies (Top-k, Top-p) impose static truncation rules that fail to accommodate the dynamic information density of natural language. This misalignment often forces a suboptimal trade-off: static bounds are either too restrictive for high-entropy creative generation or too permissive for low-entropy logical reasoning. In this work, we formalize the generation process as a trajectory through a relative probability manifold. We introduce Top-b (Adaptive Relative Band Sampling), a decoding strategy that regulates the candidate set via a dynamic bandwidth coefficient coupled strictly to the instantaneous Shannon entropy of the model's distribution. We provide a theoretical framework demonstrating that Top-b acts as a variance-minimizing operator on the tail distribution. Empirical validation on GPQA and GSM8K benchmarks indicates that Top-b significantly reduces generation entropy and inter-decoding variance while maintaining competitive reasoning accuracy, effectively approximating a self-regulating control system for autoregressive generation.

Top-b: Entropic Regulation of Relative Probability Bands in Autoregressive Language Processes

Abstract

Probabilistic language generators are theoretically modeled as discrete stochastic processes, yet standard decoding strategies (Top-k, Top-p) impose static truncation rules that fail to accommodate the dynamic information density of natural language. This misalignment often forces a suboptimal trade-off: static bounds are either too restrictive for high-entropy creative generation or too permissive for low-entropy logical reasoning. In this work, we formalize the generation process as a trajectory through a relative probability manifold. We introduce Top-b (Adaptive Relative Band Sampling), a decoding strategy that regulates the candidate set via a dynamic bandwidth coefficient coupled strictly to the instantaneous Shannon entropy of the model's distribution. We provide a theoretical framework demonstrating that Top-b acts as a variance-minimizing operator on the tail distribution. Empirical validation on GPQA and GSM8K benchmarks indicates that Top-b significantly reduces generation entropy and inter-decoding variance while maintaining competitive reasoning accuracy, effectively approximating a self-regulating control system for autoregressive generation.
Paper Structure (25 sections, 2 theorems, 9 equations, 5 figures, 3 tables)

This paper contains 25 sections, 2 theorems, 9 equations, 5 figures, 3 tables.

Key Result

Proposition 1

The Top-b mechanism exhibits the following asymptotic behaviors:

Figures (5)

  • Figure 1: Structural comparison of static cumulative truncation (Top-$p$) versus entropy-regulated relative thresholding (Top-$b$).(a) In low-entropy reasoning regimes, Top-$p$ admits a long tail of low-probability distractor tokens, increasing the risk of logical incoherence. (b) In high-entropy creative regimes, Top-$p$'s static cumulative threshold arbitrarily truncates viable tokens, artificially restricting diversity. In contrast, Top-$b$ establishes a dynamic probability band anchored to the distribution mode ($p_{\max}$). (c) Under low entropy, the Top-$b$ bandwidth strictly contracts to prune the distractor tail and enforce deterministic reasoning. (d) Under high entropy, the bandwidth expands to safely retain linguistic diversity. This illustrates how Top-$b$ continuously adapts its sampling support to the local information density of the language process.
  • Figure 2: Accuracy variance across random seeds on GPQA. Top-b exhibits the lowest variance, indicating higher deterministic stability compared to Top-p and stochastic sampling methods.
  • Figure 3: Entropy trajectory over generation steps for Top-b and Top-p sampling. Top-b induces a monotonic reduction in entropy, while Top-p maintains higher entropy in later stages due to broader candidate sets.
  • Figure 4: Interaction between bandwidth parameter $b$ and temperature $T$ on GPQA performance. Mid-range $b$ values regularize high-temperature sampling, improving accuracy without inducing mode collapse.
  • Figure 5: Schematic of entropy-induced branching. Top-b acts as a pruning operator, collapsing diffuse branches into a single high-likelihood continuation.

Theorems & Definitions (3)

  • Definition 1: Top-b Support Set
  • Proposition 1: Entropy-Scaled Constraints
  • Lemma 1