Table of Contents
Fetching ...

Generalized Taylor's Law for Dependent and Heterogeneous Heavy-Tailed Data

Pok Him Cheng, Joel E. Cohen, Hok Kan Ling, Sheung Chi Phillip Yam

TL;DR

This work extends Taylor's law to heavy-tailed data with infinite moments by establishing probabilistic limits for ratios of higher and central moments under weak dependence, heterogeneity, and network structure. The authors introduce Condition A($p$) to control covariances of truncated variables, enabling results for i.i.d., dependent, and correlated data, and further generalize to heterogeneous mixtures and conditional dependencies. They prove that $\frac{\log M_{n,p}}{\log n} \to \frac{p-\alpha}{\alpha}$ (and analogous central-moment/rate results) when the tail index is $\alpha\in(0,\infty)$, with rates depending on slowly varying $l$ and truncation levels $t_n$, and they extend the framework to semivariances and local moments. The theory is illustrated via simulations and empirical network datasets (Wikipedia Talk, Epinions, DBpedia), showing TL slopes typically exceeding 2 and closely matching Hill-estimated tail indices, thereby validating the models in complex, heavy-tailed networks. Collectively, the paper broadens the applicability of Taylor's law to dependent, heterogeneous, and network data with infinite moments, offering practical tools for understanding dispersion patterns in real-world heavy-tailed systems.

Abstract

Taylor's law, also known as fluctuation scaling in physics and the power-law variance function in statistics, is an empirical pattern widely observed across fields including ecology, physics, finance, and epidemiology. It states that the variance of a sample scales as a power function of the mean of the sample. We study generalizations of Taylor's law in the context of heavy-tailed distributions with infinite mean and variance. We establish the probabilistic limit and analyze the associated convergence rates. Our results extend the existing literature by relaxing the i.i.d. assumption to accommodate dependence and heterogeneity among the random variables. This generalization enables application to dependent data such as time series and network-structured data. We support the theoretical developments by extensive simulations, and the practical relevance through applications to real network data.

Generalized Taylor's Law for Dependent and Heterogeneous Heavy-Tailed Data

TL;DR

This work extends Taylor's law to heavy-tailed data with infinite moments by establishing probabilistic limits for ratios of higher and central moments under weak dependence, heterogeneity, and network structure. The authors introduce Condition A() to control covariances of truncated variables, enabling results for i.i.d., dependent, and correlated data, and further generalize to heterogeneous mixtures and conditional dependencies. They prove that (and analogous central-moment/rate results) when the tail index is , with rates depending on slowly varying and truncation levels , and they extend the framework to semivariances and local moments. The theory is illustrated via simulations and empirical network datasets (Wikipedia Talk, Epinions, DBpedia), showing TL slopes typically exceeding 2 and closely matching Hill-estimated tail indices, thereby validating the models in complex, heavy-tailed networks. Collectively, the paper broadens the applicability of Taylor's law to dependent, heterogeneous, and network data with infinite moments, offering practical tools for understanding dispersion patterns in real-world heavy-tailed systems.

Abstract

Taylor's law, also known as fluctuation scaling in physics and the power-law variance function in statistics, is an empirical pattern widely observed across fields including ecology, physics, finance, and epidemiology. It states that the variance of a sample scales as a power function of the mean of the sample. We study generalizations of Taylor's law in the context of heavy-tailed distributions with infinite mean and variance. We establish the probabilistic limit and analyze the associated convergence rates. Our results extend the existing literature by relaxing the i.i.d. assumption to accommodate dependence and heterogeneity among the random variables. This generalization enables application to dependent data such as time series and network-structured data. We support the theoretical developments by extensive simulations, and the practical relevance through applications to real network data.

Paper Structure

This paper contains 38 sections, 30 theorems, 132 equations, 22 figures.

Key Result

Lemma 2.1

Suppose that $X\geq 0$ has a survival function $\overline{F}(x) = x^{-\alpha}l(x)$ for $\alpha > 0$. Let $\tilde{X} := X\mathbbm{1}(X < t_n)$, where $t_n \uparrow \infty$ as $n \rightarrow \infty$ ($t_n$ does not necessarily have to satisfy Equation (eq:choice_of_tn) below). Then, for $p > \alpha$,

Figures (22)

  • Figure 1: Alternative Hill plot and tail index estimates implied by Taylor's law for Wikipedia talk dataset. The Hill and smoothed Hill estimates are plotted against the threshold parameter $\theta$, shown on the bottom $x$-axis. The Taylor’s law estimate based on subsamples of the full dataset has sample sizes indicated on the top $x$-axis. The Hill estimate at $\theta = 0.85$ is $0.563$ (99% CI: 0.554--0.572).
  • Figure 2: Log$_{10}$-variance versus log$_{10}$-mean across subsamples for Wikipedia talk dataset. The analysis includes 100 pairs (log-mean, log-variance). The regression line was fitted using ordinary least squares. The 95% and 99% confidence intervals for the intercept are $(-4.864 ,-0.455)$ and $(-5.577 ,0.258)$ respectively. For the slope, the 95% confidence interval is $(3.396 , 4.658)$, while the 99% confidence interval is $(3.191, 4.863)$. The adjusted $R^2$ value of the regression model is 0.617.
  • Figure 3: Alternative Hill plot and tail index estimates implied by Taylor's law for Epinions dataset. The Hill and smoothed Hill estimates are plotted against the threshold parameter $\theta$, shown on the bottom $x$-axis. The Taylor’s law estimate based on subsamples of the full dataset has sample sizes indicated on the top $x$-axis. The Hill estimate at $\theta = 0.8$ is $0.539$ (99% CI: 0.535--0.543).
  • Figure 4: Log-variance versus log-mean across subsamples for Epinions dataset. A total of 100 pairs of (log-mean, log-variance) were included in the analysis. The regression line was fitted using ordinary least squares. The 95% and 99% confidence intervals for the intercept are $(-3.166, 0.945)$ and $(-3.831 , 1.610)$ respectively. For the slope, the 95% confidence interval is $( 2.709 , 3.580)$, while the 99% confidence interval is $(2.568 , 3.721)$. The adjusted $R^2$ value of the regression model is 0.674.
  • Figure 5: Alternative Hill plot and tail index estimates implied by Taylor's law for DBpedia dataset. The Hill and smoothed Hill estimates are plotted against the threshold parameter $\theta$, shown on the bottom $x$-axis. The Taylor’s law estimate based on subsamples of the full dataset has sample sizes indicated on the top $x$-axis. The Hill estimate at $\theta = 0.8$ is $0.409$ (99% CI: 0.366--0.463).
  • ...and 17 more figures

Theorems & Definitions (67)

  • Lemma 2.1: Truncated Moments
  • Lemma 2.2
  • Theorem 2.3
  • Theorem 2.4
  • Corollary 2.5
  • Theorem 2.6
  • Lemma 2.7
  • Theorem 2.8: Taylor's Law for Higher Moments
  • Remark 2.9
  • Example 2.1
  • ...and 57 more