Scaling laws in empirical networks

Upasana Dutta; Alexander Ray; Aaron Clauset

Scaling laws in empirical networks

Upasana Dutta, Alexander Ray, Aaron Clauset

Abstract

How does the shape of a network change as its size increases? Although random graph models provide some expectations for such "scaling behaviors" in the structure of networks, relatively little is known about how empirical network structure scales with network size or how well random graphs explain those empirical patterns. Using a large, structurally diverse corpus of networks from four scientific domains, we first characterize the empirical scaling laws of real-world networks, considering how mean degree, transitivity, mean geodesic distance, and degree assortativity vary with network size. We show that networks from all four scientific domains exhibit a consistent set of scaling laws on these measures of network structure, but with differing scaling rates. We then assess the extent to which these empirical scaling laws are explained by three random graph models with different structural assumptions, showing that configuration model random graphs are a remarkably good model of network scaling behavior, although null models with modular structure are slightly better. These findings identify a new set of common patterns in the network structure of complex systems, provide new validation targets for models of network structure, and shed new light on the role of randomness in shaping the large-scale structure of networks.

Scaling laws in empirical networks

Abstract

Paper Structure (29 sections, 1 equation, 9 figures, 2 tables)

This paper contains 29 sections, 1 equation, 9 figures, 2 tables.

Introduction
Network Data, Scaling Laws, and Random Graph Models
Empirical Scaling Laws
Average Degree
Mean Geodesic distance
Clustering Coefficient
Degree Assortativity
Random graph scaling laws
Erdős–Rényi random graph
The configuration model
The Degree-Corrected Stochastic Block Model
Discussion
Acknowledgments
Funding Statement
Competing Interests
...and 14 more sections

Figures (9)

Figure 1: (a) mean degree, (b) mean geodesic path length, (c) clustering coefficient, and (d) degree assortativity of the 254 empirical networks in our corpus as a function of the number of nodes, across the social, biological, informational, and technological domains. The colored triangles represent networks of the particular domain, while the gray circles in the background represent the remaining networks outside each domain.
Figure 2: Scaling behaviors of (a) mean geodesic distance $\langle \ell \rangle$, (b) global clustering coefficient $C$, and (c) degree assortativity $r$, as a function of the number of nodes $n$ for 254 empirical networks. Scaling behaviors are shown for the social, biological, technological, and informational network domains. For a given pair of summary statistic and network domain, the least-squares best-fit line for the empirical networks and for the networks generated from three different random graph models are shown (see legend). For each empirical network, the summary statistics for the random graph models are computed by averaging over 50 random graphs generated from the corresponding model. The scatter points show the empirical networks.
Figure 3: Descriptive overview of the empirical network corpus. (a) Relationship between the number of nodes and number of edges for all networks in the dataset, illustrating the wide range of sizes and densities across the corpus. (b) Distribution of different types of networks within each of the four scientific domains: social, biological, informational, and technological, highlighting the structural diversity of the networks used in our analysis.
Figure 4: Absolute error, i.e., absolute difference between true and approximated values of mean geodesic distance $\langle \ell \rangle$ using the batch random pairwise distance sampling technique as a function of number of nodes n. Included are 254 empirical networks with exactly computed values of L; approximated values are generated from the batch sampling technique with a batch size of 1000 and a convergence threshold of 0.1. The right panel shows the generated error distribution, indicating that the errors are minimal, and for a vast majority of the networks, this absolute difference is less than 0.2.
Figure 5: A double-edge swap removing the multi-edge (a, b) by swapping edges (a, b) and (c, d) with (a, d) and (c, b), while still preserving the degree sequence, community labels, and inter-community edge counts. Note that node c is chosen such that nodes a and c belong to the same community.
...and 4 more figures

Scaling laws in empirical networks

Abstract

Scaling laws in empirical networks

Authors

Abstract

Table of Contents

Figures (9)