Table of Contents
Fetching ...

General Distributions of Number Representation Elements

Félix Balado, Guénolé C. M. Silvestre

Abstract

We provide general expressions for the joint distributions of the $k$ most significant $b$-ary digits and of the $k$ leading continued fraction coefficients of outcomes of an arbitrary continuous random variable. Our analysis highlights the connections between the two problems. In particular, we give the general convergence law of the distribution of the $j$-th significant digit, which is the counterpart of the general convergence law of the distribution of the $j$-th continued fraction coefficient (Gauss-Kuz'min law). We also particularise our general results for Benford and Pareto random variables. The former particularisation allows us to show the central role played by Benford variables in the asymptotics of the general expressions, among other results. The particularisation for Pareto variables -- which include Benford variables as a special case -- is specially relevant in the context of pervasive scale-invariant phenomena, where Pareto variables occur much more frequently than Benford variables. This suggests that the Pareto expressions that we produce have wider applicability than their Benford counterparts in modelling most significant digits and leading continued fraction coefficients of real data. Our results may find practical application in all areas where Benford's law has been previously used.

General Distributions of Number Representation Elements

Abstract

We provide general expressions for the joint distributions of the most significant -ary digits and of the leading continued fraction coefficients of outcomes of an arbitrary continuous random variable. Our analysis highlights the connections between the two problems. In particular, we give the general convergence law of the distribution of the -th significant digit, which is the counterpart of the general convergence law of the distribution of the -th continued fraction coefficient (Gauss-Kuz'min law). We also particularise our general results for Benford and Pareto random variables. The former particularisation allows us to show the central role played by Benford variables in the asymptotics of the general expressions, among other results. The particularisation for Pareto variables -- which include Benford variables as a special case -- is specially relevant in the context of pervasive scale-invariant phenomena, where Pareto variables occur much more frequently than Benford variables. This suggests that the Pareto expressions that we produce have wider applicability than their Benford counterparts in modelling most significant digits and leading continued fraction coefficients of real data. Our results may find practical application in all areas where Benford's law has been previously used.
Paper Structure (16 sections, 5 theorems, 54 equations, 12 figures)

This paper contains 16 sections, 5 theorems, 54 equations, 12 figures.

Key Result

Theorem 2.1

If $A_{(k)}$ denotes the discrete r.v. that models the $k$ most significant $b$-ary digits (i.e. the $k$-th integer significand) of a positive continuous r.v. $X$, then where $a\in\mathcal{A}_{(k)}$ and $Y=\log_b X$.

Figures (12)

  • Figure 1: Illustration of the asymptotic sum-invariance property of a Benford variable for $b=10$.
  • Figure 2: Theoretical distribution of the two most significant decimal digits of Pareto $X$\ref{['eq:paretok_general']} versus theoretical Benford-based asymptotic approximation \ref{['eq:ak_pmf_asympt']}. The lines join probability mass points for clarity.
  • Figure 3: Theoretical joint pmf of the first two CF coefficients of $\log_{10}X$ for Pareto $X$ with $s=1$ and $\rho=0.3$ [solid lines, \ref{['eq:cf_joint_pmf_pareto']}] versus theoretical Benford-based asymptotic approximation [dashed lines, \ref{['eq:cf_joint_pmf_asympt']}]. The lines join probability mass points corresponding to equal $a_2$ for clarity.
  • Figure 4: Distributions of the most significant decimal digit of $X$. The theoretical pmf's (solid and dashed lines) are \ref{['eq:benfordk']} and \ref{['eq:paretok_general']}, and the empirical frequencies ($\square$) correspond to $p=10^7$ pseudorandom outcomes in each case.
  • Figure 5: Distributions of the two most significant decimal digits of $X$. The theoretical pmf's (solid and dashed lines) are \ref{['eq:benfordk']} and \ref{['eq:paretok_general']}, and the empirical frequencies ($\square$) correspond to $p=10^7$ pseudorandom outcomes in each case.
  • ...and 7 more figures

Theorems & Definitions (16)

  • Theorem 2.1: General distribution of the $k$ most significant $b$-ary digits
  • Remark 1
  • Remark 2
  • Theorem 2.2: General asymptotic distribution of the $j$-th most significant $b$-ary digit
  • Remark 3
  • Lemma 3.1
  • Remark 4
  • Theorem 3.2: General distribution of the $k$ leading continued fraction coefficients
  • Remark 5
  • Remark 6
  • ...and 6 more