Benford's Law from Turing Ensembles and Integer Partitions
Alexander Kolpakov, Aidan Rocke
TL;DR
The paper tackles why Benford's first-digit law appears across diverse datasets by developing two complementary generative mechanisms. It builds an information-theoretic framework around a halting-constrained probabilistic Turing-machine ensemble, showing that entropy maximization yields a uniform distribution on logarithmic scales, which reproduces Benford's law in digit statistics; and it complements this with a constrained partition (Einstein-solid) approach that aligns with a non-ergodic, renormalization-flow picture to produce the same logarithmic digit profile. A key result is the identification of a phase-transition controlled by the halting probability $p_S$, with a critical threshold linked to a universal constant $\lambda \approx 0.23075$, governing when Benford statistics emerge. The second mechanism, based on non-ergodic integer partitions and entropy-rate constraints, provides a parallel derivation that reinforces the central claim: Benford behavior is a consequence of constrained entropy and the mismatch between spatial and temporal digit averages. Numerical experiments corroborate the theory, illustrate truncation effects, and demonstrate that extending support reestablishes Benford convergence, offering practical guidance for empirical data exhibiting finite support.
Abstract
We develop two complementary generative mechanisms that explain when and why Benford's first-digit law arises. First, a probabilistic Turing machine (PTM) ensemble induces a geometric law for codelength. Maximizing its entropy under a constraint on halting length yields Benford statistics. This model shows a phase transition with respect to the halt probability. Second, a constrained partition model (Einstein-solid combinatorics) recovers the same logarithmic profile as the maximum-entropy solution under a coarse-grained entropy-rate constraint, clarifying the role of non-ergodicity (ensemble vs. trajectory averages). We also perform numerical experiments that corroborate our conclusions.
