Table of Contents
Fetching ...

Benford's Law from Turing Ensembles and Integer Partitions

Alexander Kolpakov, Aidan Rocke

TL;DR

The paper tackles why Benford's first-digit law appears across diverse datasets by developing two complementary generative mechanisms. It builds an information-theoretic framework around a halting-constrained probabilistic Turing-machine ensemble, showing that entropy maximization yields a uniform distribution on logarithmic scales, which reproduces Benford's law in digit statistics; and it complements this with a constrained partition (Einstein-solid) approach that aligns with a non-ergodic, renormalization-flow picture to produce the same logarithmic digit profile. A key result is the identification of a phase-transition controlled by the halting probability $p_S$, with a critical threshold linked to a universal constant $\lambda \approx 0.23075$, governing when Benford statistics emerge. The second mechanism, based on non-ergodic integer partitions and entropy-rate constraints, provides a parallel derivation that reinforces the central claim: Benford behavior is a consequence of constrained entropy and the mismatch between spatial and temporal digit averages. Numerical experiments corroborate the theory, illustrate truncation effects, and demonstrate that extending support reestablishes Benford convergence, offering practical guidance for empirical data exhibiting finite support.

Abstract

We develop two complementary generative mechanisms that explain when and why Benford's first-digit law arises. First, a probabilistic Turing machine (PTM) ensemble induces a geometric law for codelength. Maximizing its entropy under a constraint on halting length yields Benford statistics. This model shows a phase transition with respect to the halt probability. Second, a constrained partition model (Einstein-solid combinatorics) recovers the same logarithmic profile as the maximum-entropy solution under a coarse-grained entropy-rate constraint, clarifying the role of non-ergodicity (ensemble vs. trajectory averages). We also perform numerical experiments that corroborate our conclusions.

Benford's Law from Turing Ensembles and Integer Partitions

TL;DR

The paper tackles why Benford's first-digit law appears across diverse datasets by developing two complementary generative mechanisms. It builds an information-theoretic framework around a halting-constrained probabilistic Turing-machine ensemble, showing that entropy maximization yields a uniform distribution on logarithmic scales, which reproduces Benford's law in digit statistics; and it complements this with a constrained partition (Einstein-solid) approach that aligns with a non-ergodic, renormalization-flow picture to produce the same logarithmic digit profile. A key result is the identification of a phase-transition controlled by the halting probability , with a critical threshold linked to a universal constant , governing when Benford statistics emerge. The second mechanism, based on non-ergodic integer partitions and entropy-rate constraints, provides a parallel derivation that reinforces the central claim: Benford behavior is a consequence of constrained entropy and the mismatch between spatial and temporal digit averages. Numerical experiments corroborate the theory, illustrate truncation effects, and demonstrate that extending support reestablishes Benford convergence, offering practical guidance for empirical data exhibiting finite support.

Abstract

We develop two complementary generative mechanisms that explain when and why Benford's first-digit law arises. First, a probabilistic Turing machine (PTM) ensemble induces a geometric law for codelength. Maximizing its entropy under a constraint on halting length yields Benford statistics. This model shows a phase transition with respect to the halt probability. Second, a constrained partition model (Einstein-solid combinatorics) recovers the same logarithmic profile as the maximum-entropy solution under a coarse-grained entropy-rate constraint, clarifying the role of non-ergodicity (ensemble vs. trajectory averages). We also perform numerical experiments that corroborate our conclusions.

Paper Structure

This paper contains 7 sections, 31 equations, 2 figures.

Figures (2)

  • Figure 1: TVD from the model distribution of first significant digits to Benford's law. The model parameter $\lambda$ responsible for the halting probability is on the horizontal axis: $\lambda = 10^{-12}, 10^{-6}, 0.025, 0.05, 0.075, 0.1, 0.15, 0.175, 0.2, 0.23076 (\approx \lambda_*), 0.225, 0.25, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9$. Another parameter $N$ is fixed throughout. We use $100$ trials for each value of $\lambda$ to generate the empirical distribution. Error bars are set to $1$ standard deviation.
  • Figure 2: The same empirically measured TVD zoom around $\lambda_* \approx 0.230759776818$. A visible plateau starts at around $\lambda_*$ when the distribution of the order of magnitude $k = \lfloor \log_2(X) \rfloor$ becomes close to uniform. However, there is a slight downward slope for $N=64$ due to the higher-order effects within each order of magnitude $\{\log_2(X)\}$. No such slope is visible for $N=128$. We use $100$ trials for each value of $\lambda$ to generate the empirical distribution. Error bars are set to $1$ standard error of the mean (SEM).