Table of Contents
Fetching ...

Degree distributions in networks: beyond the power law

Clement Lee, Emma Eastoe, Aiden Farrell

TL;DR

The paper tackles the limitations of using a single power-law to describe network degree distributions, notably threshold selection and adequacy testing. It proposes a Bayesian extreme-value mixture that combines a discrete TZP body with a discrete GP tail via an IGP, allowing threshold uncertainty to be quantified and a formal test for power-law adequacy. By employing 2- and 3-component TZP-IGP mixtures and a spike-and-slab-based model selection, the approach reveals when the body follows a power law and how tails deviate, across real-world networks and word-frequency data. The results show strong goodness-of-fit for many datasets, with the mixture capturing piecewise linear survival curves and correcting tail behavior that Zipf-based models miss, offering a principled alternative to preferential attachment as a network-generating mechanism.

Abstract

The power law is useful in describing count phenomena such as network degrees and word frequencies. With a single parameter, it captures the main feature that the frequencies are linear on the log-log scale. Nevertheless, there have been criticisms of the power law, for example that a threshold needs to be pre-selected without its uncertainty quantified, that the power law is simply inadequate, and that subsequent hypothesis tests are required to determine whether the data could have come from the power law. We propose a modelling framework that combines two different generalisations of the power law, namely the generalised Pareto distribution and the Zipf-polylog distribution, to resolve these issues. The proposed mixture distributions are shown to fit the data well and quantify the threshold uncertainty in a natural way. A model selection step embedded in the Bayesian inference algorithm further answers the question whether the power law is adequate.

Degree distributions in networks: beyond the power law

TL;DR

The paper tackles the limitations of using a single power-law to describe network degree distributions, notably threshold selection and adequacy testing. It proposes a Bayesian extreme-value mixture that combines a discrete TZP body with a discrete GP tail via an IGP, allowing threshold uncertainty to be quantified and a formal test for power-law adequacy. By employing 2- and 3-component TZP-IGP mixtures and a spike-and-slab-based model selection, the approach reveals when the body follows a power law and how tails deviate, across real-world networks and word-frequency data. The results show strong goodness-of-fit for many datasets, with the mixture capturing piecewise linear survival curves and correcting tail behavior that Zipf-based models miss, offering a principled alternative to preferential attachment as a network-generating mechanism.

Abstract

The power law is useful in describing count phenomena such as network degrees and word frequencies. With a single parameter, it captures the main feature that the frequencies are linear on the log-log scale. Nevertheless, there have been criticisms of the power law, for example that a threshold needs to be pre-selected without its uncertainty quantified, that the power law is simply inadequate, and that subsequent hypothesis tests are required to determine whether the data could have come from the power law. We propose a modelling framework that combines two different generalisations of the power law, namely the generalised Pareto distribution and the Zipf-polylog distribution, to resolve these issues. The proposed mixture distributions are shown to fit the data well and quantify the threshold uncertainty in a natural way. A model selection step embedded in the Bayesian inference algorithm further answers the question whether the power law is adequate.

Paper Structure

This paper contains 23 sections, 8 theorems, 46 equations, 8 figures.

Key Result

Lemma 1

If $F$ is the polylog$(\alpha,\theta,w)$ distribution, $\lim_{n\rightarrow\infty}\frac{S(n)}{S(n+1)}=\frac{1}{\theta}$.

Figures (8)

  • Figure 1: Empirical frequencies (top) and survival function (bottom) on the log-log scale of four data sets. Barring the rightmost data set, a red dashed line with slope $-\alpha$ in the top plot implies a dashed line with slope $-(\alpha-1)$ in the bottom plot, and suggests a power law with exponent $\alpha$.
  • Figure 2: Relationships between various continuous (green oval) and discrete (red rectangular) distributions. A solid arrow from $A$ to $B$ means $B$ is a special case of $A$. The meanings of various dashed arrows are given in the figure. Distributions that imply the power law are in the second row, while those that form the proposed mixture distributions are bolded.
  • Figure 3: Schematic representations of the 2-component (left) and 3-component (right) mixture distributions. Spacing between consecutive integers is not drawn to scale.
  • Figure 4: Survival function with credible intervals for the selected mixture (red) and ZP (green) distributions. The blue and yellow bands represent the posterior of $u$ and $v$, respectively, for the mixture distribution fits. The number beside the data set name is the Bayes factor (to 3 s.f.) for $\theta_{\text{mix}}=1$ relative to $\theta_{\text{mix}}\in(0,1)$.
  • Figure 5: Boxplot of $\xi$ and the tail indices implied by the selected mixture and ZP distributions. The number beside the data set name is the Bayes factor (to 3 s.f.) for $\theta_{\text{mix}}=1$ relative to $\theta_{\text{mix}}\in(0,1)$. In the first and third columns, $\xi_{\text{mix}}=0$; in the other two columns, some boxes for $\xi_{\text{mix}}$ are missing as all sampled values of $\alpha_{\text{mix}}$ are smaller than 1.
  • ...and 3 more figures

Theorems & Definitions (16)

  • Lemma 1
  • proof
  • Theorem 2
  • proof
  • Lemma 3
  • proof
  • Lemma 4
  • proof
  • Lemma 5
  • proof
  • ...and 6 more