Table of Contents
Fetching ...

Scale invariance and statistical significance in complex weighted networks

Filipi N. Silva, Sadamori Kojaku, Alessandro Flammini, Filippo Radicchi, Santo Fortunato

TL;DR

Addressed the problem that statistical significance in weighted networks, when assessed with the Weighted Configuration Model (WCM), depends on the weight scale. The authors show that while several measures are scale-invariant, the WCM yields null distributions whose width scales as $A^{-1/2}$, making $p$-values scale-dependent; they propose a two-step, scale-invariant null model that separates structure and weights by first randomizing topology and then drawing weights from a scale-invariant exponential distribution with mean related to $s_i$, $s_j$, $k_i$, $k_j$, and $W$. They present a practical CReMb variant with modularity compatibility that preserves mean strengths while allowing efficient sampling, and they validate the approach on four real networks (e.g., Zachary Karate Club, NKI Brain, World Trade, London Transport) along with Netzschleuder data. The results show that weighted clustering is often significant under the scale-invariant null, maximum eigenvector centrality is generally not, and modularity significance depends on network; overall the method enables unbiased assessments of weighted networks and clarifies the limitations of the WCM. The work clarifies the nontrivial role of scale in null models and provides a pathway to robust statistical inference in complex weighted networks.

Abstract

Most networks encountered in nature, society, and technology have weighted edges, representing the strength of the interaction/association between their vertices. Randomizing the structure of a network is a classic procedure used to estimate the statistical significance of properties of the network, such as transitivity, centrality and community structure. Randomization of weighted networks has traditionally been done via the weighted configuration model (WCM), a simple extension of the configuration model, where weights are interpreted as bundles of edges. It has previously been shown that the ensemble of randomizations provided by the WCM is affected by the specific scale used to compute the weights, but the consequences for statistical significance were unclear. Here we find that statistical significance based on the WCM is scale-dependent, whereas in most cases results should be independent of the choice of the scale. More generally, we find that designing a null model that does not violate scale invariance is challenging. A two-step approach, originally introduced for network reconstruction, in which one first randomizes the structure, then the weights, with a suitable distribution, restores scale invariance, and allows us to conduct unbiased assessments of significance on weighted networks.

Scale invariance and statistical significance in complex weighted networks

TL;DR

Addressed the problem that statistical significance in weighted networks, when assessed with the Weighted Configuration Model (WCM), depends on the weight scale. The authors show that while several measures are scale-invariant, the WCM yields null distributions whose width scales as , making -values scale-dependent; they propose a two-step, scale-invariant null model that separates structure and weights by first randomizing topology and then drawing weights from a scale-invariant exponential distribution with mean related to , , , , and . They present a practical CReMb variant with modularity compatibility that preserves mean strengths while allowing efficient sampling, and they validate the approach on four real networks (e.g., Zachary Karate Club, NKI Brain, World Trade, London Transport) along with Netzschleuder data. The results show that weighted clustering is often significant under the scale-invariant null, maximum eigenvector centrality is generally not, and modularity significance depends on network; overall the method enables unbiased assessments of weighted networks and clarifies the limitations of the WCM. The work clarifies the nontrivial role of scale in null models and provides a pathway to robust statistical inference in complex weighted networks.

Abstract

Most networks encountered in nature, society, and technology have weighted edges, representing the strength of the interaction/association between their vertices. Randomizing the structure of a network is a classic procedure used to estimate the statistical significance of properties of the network, such as transitivity, centrality and community structure. Randomization of weighted networks has traditionally been done via the weighted configuration model (WCM), a simple extension of the configuration model, where weights are interpreted as bundles of edges. It has previously been shown that the ensemble of randomizations provided by the WCM is affected by the specific scale used to compute the weights, but the consequences for statistical significance were unclear. Here we find that statistical significance based on the WCM is scale-dependent, whereas in most cases results should be independent of the choice of the scale. More generally, we find that designing a null model that does not violate scale invariance is challenging. A two-step approach, originally introduced for network reconstruction, in which one first randomizes the structure, then the weights, with a suitable distribution, restores scale invariance, and allows us to conduct unbiased assessments of significance on weighted networks.

Paper Structure

This paper contains 14 sections, 15 equations, 6 figures, 4 tables.

Figures (6)

  • Figure 1: Statistical significance of network variables. The $p$-value of the chosen variable measured on the original network equals the area under the curve of the null model distribution of the variable, to the right of the measured value if we argue that it is higher than on randomized networks (to the left if it is lower).
  • Figure 2: Scale dependence of the standard deviation of the WCM distribution for the weighted clustering coefficient, maximum eigenvector centrality, and maximum modularity in four networks: Zachary's karate club, NKI Brain, World Trade, and London Transport (see Section \ref{['subsec:data']} for details). The dashed line represents the conjectured inverse square root behavior, which the three curves follow closely in each case as $A$ increases. Dotted lines indicate results obtained from the weighted Chung–Lu model using hypergeometric distributions (see section \ref{['subsec:wchunglu']}). The inset for the World Trade network shows the Poisson approximation for large $A$-values of the WCM model for the weighted clustering coefficient.
  • Figure 3: Scale dependence of the distribution of our variant of the CReMb, that separates structure and weights. The variables and the networks are the same as in Fig. \ref{['fig:Ascaling']}. The curves are approximately flat, signaling scale invariance.
  • Figure 4: Statistical significance of the three focal variables for the WCM and our variant of the CReMb on a large collection of real networks. For each network we indicate the $p$-value of each variable (dots for the WCM, crosses for the CReMb). Since the WCM is not scale invariant, we compute the $p$-values considering the original weights of the networks, without rescaling. The shaded regions correspond to $p$-values $\leq 0.05$.
  • Figure 5: Empirical versus predicted distributions of edge weights in the configuration model for the Zachary Karate Club network. The distributions correspond to the total weight of a single edge (between nodes 9 and 16) measured across multiple realizations. Red markers represent the empirical probabilities obtained from simulations, while black lines show the analytical predictions based on the weighted Chung–Lu approximation using different weight-generating distributions (hypergeometric, binomial, and Poisson). The top row shows the sparse regime ($A = 1$), while the bottom row shows the dense regime ($A = 1000$), where $A$ is a scaling factor applied to the original node strengths.
  • ...and 1 more figures