Table of Contents
Fetching ...

Why Eight Percent of Benford Sequences Never Converge

James M. Hyman

Abstract

We study multi-digit correlations in Benford sequences b^n for integer bases 2 <= b <= 1000, measuring dependence via conditional mutual information (CMI). A resonance ratio derived from the continued fraction expansion of log_10(b) classifies bases into convergent and persistent regimes (Theorem 3.13): among 996 bases surveyed, 84 (8.4%) exhibit persistent correlations at sample depth N = 10,000, and extended computation to N = 200,000 confirms 53 (5.3%) as genuinely persistent. We prove that CMI deviation is bounded by the distribution error (Theorem 3.4); exhaustive computation across 2,988 test cases confirms that the effective scaling is quadratic, yielding a two-sided rate beta = 2 for bounded-type bases (conditional on a computationally verified Hessian positivity condition). The observed effective exponent across 774 convergent bases is beta_eff = 1.72 +/- 0.19, consistent with finite-sample corrections to the asymptotic rate. We conjecture that the persistence rate converges to 1/12, a prediction grounded in the Gauss-Kuzmin distribution of partial quotients. For persistent bases, the convergence threshold N_epsilon exceeds 10^6 at standard precision, rendering the asymptotic limit observationally irrelevant within our computational scope.

Why Eight Percent of Benford Sequences Never Converge

Abstract

We study multi-digit correlations in Benford sequences b^n for integer bases 2 <= b <= 1000, measuring dependence via conditional mutual information (CMI). A resonance ratio derived from the continued fraction expansion of log_10(b) classifies bases into convergent and persistent regimes (Theorem 3.13): among 996 bases surveyed, 84 (8.4%) exhibit persistent correlations at sample depth N = 10,000, and extended computation to N = 200,000 confirms 53 (5.3%) as genuinely persistent. We prove that CMI deviation is bounded by the distribution error (Theorem 3.4); exhaustive computation across 2,988 test cases confirms that the effective scaling is quadratic, yielding a two-sided rate beta = 2 for bounded-type bases (conditional on a computationally verified Hessian positivity condition). The observed effective exponent across 774 convergent bases is beta_eff = 1.72 +/- 0.19, consistent with finite-sample corrections to the asymptotic rate. We conjecture that the persistence rate converges to 1/12, a prediction grounded in the Gauss-Kuzmin distribution of partial quotients. For persistent bases, the convergence threshold N_epsilon exceeds 10^6 at standard precision, rendering the asymptotic limit observationally irrelevant within our computational scope.
Paper Structure (46 sections, 17 theorems, 42 equations, 5 figures, 8 tables)

This paper contains 46 sections, 17 theorems, 42 equations, 5 figures, 8 tables.

Key Result

Lemma 2.3

Let $b > 1$ and $\alpha = \log_{10}(b)$. For the sequence $a_n = b^n$, the significand is and the $k$-th significant digit is determined by the interval in $[0,1)$ containing $\{n\alpha\}$.

Figures (5)

  • Figure 1: Classification phase diagram in the $(Q^*, N)$ plane, where $Q^* = \max_k(a_{k+1} q_k)$ is the resonance parameter (Definition \ref{['def:resonance']}). Three regions emerge: convergent (green, $Q^* < N/\log N$, power-law decay), persistent (red, $Q^* > N\log N$, constant CMI), and transitional (yellow). Boundaries follow Theorem \ref{['thm:dichotomy']} and Observation \ref{['obs:transitional']}. For any practical $N \leq 10^5$, bases in the upper-left region never show convergent behavior.
  • Figure 2: Conditional mutual information $I(D_1; D_3 \mid D_2)$ versus sample size $N$ on logarithmic axes. Convergent sequences ($2^n$, $3^n$, $5^n$) show linear decay corresponding to power laws $\beta \approx 1.7$--$1.8$. The $7^n$ sequence (red diamonds) shows no decay, remaining near 0.68 bits across all sample sizes. Dashed lines show power-law fits.
  • Figure 3: Bimodal distribution of decay exponents across 996 integer bases. A dominant peak at $\beta \approx 1.72$ (774 convergent bases, 77.7%) and a secondary concentration below $\beta = 0.3$ (84 persistent bases, 8.4%). The near-absence of bases with $\beta \in [0.5, 1.3]$ supports the convergent-persistent dichotomy. Extended computation to $N = 200{,}000$ resolves most transitional cases (Table \ref{['tab:survey_summary']}).
  • Figure 4: Survey classification overview. (a) Distribution across categories at $N = 10{,}000$: convergent 77.7%, persistent 8.4%, transitional 13.9%; extended to $N = 200{,}000$: 92.4% and 5.3%. (b) Anomaly strength versus maximum partial quotient, confirming the mechanistic role of exceptional rational approximations.
  • Figure 5: The resonance parameter $Q^*$ separates convergent from persistent bases. Cumulative distribution of $Q^*$ across 996 bases. The empirical threshold $Q^*_{\mathrm{emp}} \approx 145{,}000$ achieves $>98\%$ classification accuracy: bases above it are overwhelmingly persistent (96.5%), while bases below are convergent or transitional.

Theorems & Definitions (41)

  • Definition 2.1: Significant digits
  • Definition 2.2: Benford distribution
  • Lemma 2.3: Digit-fractional part correspondence
  • proof
  • Theorem 2.4: Weyl, 1916
  • Theorem 2.5: Best approximation property
  • Definition 2.6: Irrationality type
  • Remark 2.7
  • Definition 2.8: Cylinder sets
  • Lemma 2.9: Cylinder set measures
  • ...and 31 more