Table of Contents
Fetching ...

Universal emergence of local Zipf-Mandelbrot law

Davide Cugini, André Timpanaro, Giacomo Livan, Giacomo Guarnieri

TL;DR

The paper addresses the ubiquity of Zipf-Mandelbrot law by deriving a general analytic relation between order statistics and the parent distribution. It introduces Order Duality Relationships (ODRs) that give closed-form expressions for the means and covariances of order statistics and show a concentration of measure as $N\to\infty$. Using ODRs, it proves that a small subset of ranked observations exhibits Zipf-Mandelbrot behavior with a locally defined exponent $\alpha(\lambda_0) = -1/(z^{(1)}(H_0)+1)$, where $H_0 = \ln\langle X(\lambda_0)\rangle$. The authors validate the theory on Miller's typing monkey, Barabasi-Albert networks, and Gaussian data, illustrating both global and local emergence of ZM and providing a rigorous criterion for when a distribution should be read as a power law. The results offer a unifying framework for interpreting rank-frequency patterns across disciplines and guide when local ZM is expected in large datasets.

Abstract

A plethora of natural and socio-economic phenomena share a striking statistical regularity, that is the magnitude of elements decreases with a power law as a function of their position in a ranking of magnitude. Such regularity is known as Zipf-Mandelbrot law (ZM), and plenty of problem-specific explanations for its emergence have been provided in different fields. Yet, an explanation for ZM ubiquity is currently lacking. In this paper we first provide an analytical expression for the cumulants of any ranked sample of i.i.d. random variables once sorted in decreasing order. Then we make use of this result to rigorously demonstrate that, whenever a small fraction of such ranked dataset is considered, it becomes statistically indistinguishable from a ZM law. We finally validate our results against several relevant examples.

Universal emergence of local Zipf-Mandelbrot law

TL;DR

The paper addresses the ubiquity of Zipf-Mandelbrot law by deriving a general analytic relation between order statistics and the parent distribution. It introduces Order Duality Relationships (ODRs) that give closed-form expressions for the means and covariances of order statistics and show a concentration of measure as . Using ODRs, it proves that a small subset of ranked observations exhibits Zipf-Mandelbrot behavior with a locally defined exponent , where . The authors validate the theory on Miller's typing monkey, Barabasi-Albert networks, and Gaussian data, illustrating both global and local emergence of ZM and providing a rigorous criterion for when a distribution should be read as a power law. The results offer a unifying framework for interpreting rank-frequency patterns across disciplines and guide when local ZM is expected in large datasets.

Abstract

A plethora of natural and socio-economic phenomena share a striking statistical regularity, that is the magnitude of elements decreases with a power law as a function of their position in a ranking of magnitude. Such regularity is known as Zipf-Mandelbrot law (ZM), and plenty of problem-specific explanations for its emergence have been provided in different fields. Yet, an explanation for ZM ubiquity is currently lacking. In this paper we first provide an analytical expression for the cumulants of any ranked sample of i.i.d. random variables once sorted in decreasing order. Then we make use of this result to rigorously demonstrate that, whenever a small fraction of such ranked dataset is considered, it becomes statistically indistinguishable from a ZM law. We finally validate our results against several relevant examples.
Paper Structure (5 sections, 85 equations, 4 figures)

This paper contains 5 sections, 85 equations, 4 figures.

Figures (4)

  • Figure 1: Panel (a) represents the parent distribution, which in this case we arbitrarily chose to be $p(x)~=~(8/ \pi)\,x^{1/2}\,(x+1)^{-3}$, used to extract $N = 30$ i.i.d. samples, depicted as dots. The same samples are reported in panel (b) after being sorted, together with the order statistics expectation value $\langle X(\lambda)\rangle$, obtained from the leading order of Eq. \ref{['eq:ODR']}, depicted as a dashed line. The filled area in panel (b) represents the region within the expected standard deviation $\Sigma^{1/2}\left(X(\lambda),\,X(\lambda) \right)$ from the dashed line, that was also obtained from Eq. \ref{['eq:ODR2']} at the leading order.
  • Figure 2: Ranking of nodes of a network generated through Barabási-Albert algorithm, where nodes are ranked basing on their number of edges $X$. In the plots $\langle X(\lambda)\rangle$ is normalized on the interval $\lambda \in [0,1]$. Panels (a) and (b) represent the expectation values $\langle X(\lambda)\rangle$ and the standard deviations $\sqrt{\mathrm{Var}\left(X(\lambda)\right)}$ respectively. The reported results have been obtained numerically for the network sizes $N = \{20, 50, 100 \}$ and analytically from Eq. \ref{['eq:ODR']} in the $N \to \infty$ limit. The deviations of the numerical results from the analytical ones depend on discretization effects (the number of edges is an integer number before normalization), in addition to the finiteness of $N$. Discretization effects are evident in the rightmost part of panel (b).
  • Figure 3: Order statistics $X_r$ associated with a Gaussian parent distribution with $\mu = 0$ and $\sigma = 1$. Numerical results are reported together with shades representing their statistical error. The plot represents the top $100$ ranked draws out of $N = 8 \cdot 10^9$ total samples. The local power-law behaviour around $r_0 = 10$, theoretically predicted from Eq. \ref{['eq: local behaviour']}, is reported as a dashed line together with its predicted statistical fluctuations obtained from Eq. \ref{['eq:ODR']}.
  • Figure S1: Order statistics $X(\lambda)$ associated with a Gaussian normal parent distribution. Numerical results are reported together with shades representing their statistical error for a sample size of $N = 10^6$. Zoomed regions are plotted in a log-log scale, where a linear behaviour clearly shows up, highlighting the power-law nature of $X(\lambda)$. An only exception is made around $X(\lambda) = 1$, for which the linear behaviour is made manifest with a logy scale, confirming the expected local exponential behaviour of $X(\lambda)$.