Generalized Taylor's Law for Dependent and Heterogeneous Heavy-Tailed Data
Pok Him Cheng, Joel E. Cohen, Hok Kan Ling, Sheung Chi Phillip Yam
TL;DR
This work extends Taylor's law to heavy-tailed data with infinite moments by establishing probabilistic limits for ratios of higher and central moments under weak dependence, heterogeneity, and network structure. The authors introduce Condition A($p$) to control covariances of truncated variables, enabling results for i.i.d., dependent, and correlated data, and further generalize to heterogeneous mixtures and conditional dependencies. They prove that $\frac{\log M_{n,p}}{\log n} \to \frac{p-\alpha}{\alpha}$ (and analogous central-moment/rate results) when the tail index is $\alpha\in(0,\infty)$, with rates depending on slowly varying $l$ and truncation levels $t_n$, and they extend the framework to semivariances and local moments. The theory is illustrated via simulations and empirical network datasets (Wikipedia Talk, Epinions, DBpedia), showing TL slopes typically exceeding 2 and closely matching Hill-estimated tail indices, thereby validating the models in complex, heavy-tailed networks. Collectively, the paper broadens the applicability of Taylor's law to dependent, heterogeneous, and network data with infinite moments, offering practical tools for understanding dispersion patterns in real-world heavy-tailed systems.
Abstract
Taylor's law, also known as fluctuation scaling in physics and the power-law variance function in statistics, is an empirical pattern widely observed across fields including ecology, physics, finance, and epidemiology. It states that the variance of a sample scales as a power function of the mean of the sample. We study generalizations of Taylor's law in the context of heavy-tailed distributions with infinite mean and variance. We establish the probabilistic limit and analyze the associated convergence rates. Our results extend the existing literature by relaxing the i.i.d. assumption to accommodate dependence and heterogeneity among the random variables. This generalization enables application to dependent data such as time series and network-structured data. We support the theoretical developments by extensive simulations, and the practical relevance through applications to real network data.
