Empirical Networks are Sparse: Enhancing Multi-Edge Models with Zero-Inflation
Giona Casiraghi, Georges Andres
TL;DR
The paper shows that empirical multi-edge networks are sparse and exhibit zero-inflation, which is not captured by traditional Poisson-based models like $G(N,p)$, SBM, or DCSBM. It introduces zero-inflated Poisson extensions (zi-$G(N,p)$, zi-SBM, zi-CLCM, zi-DCSBM) with EM-like parameter estimation, guided by moment matching and, where needed, Lambert $W$ solutions. Through Sociopatterns datasets, the authors demonstrate that zi-DCSBM and related models better reproduce the observed edge-count distributions, sparsity, heavy tails, and diffusion- and small-world-related metrics, compared to their non-zero-inflated counterparts. The work provides a more faithful framework for modeling real-world sparse networks and highlights avenues for further enhancements, including richer count distributions and specialized block-inference methods for zero-inflated structures.
Abstract
Real-world networks are sparse. As we show in this article, even when a large number of interactions is observed, most node pairs remain disconnected. We demonstrate that classical multi-edge network models, such as the $G(N,p)$, configuration models, and stochastic block models, fail to accurately capture this phenomenon. To mitigate this issue, zero-inflation must be integrated into these traditional models. Through zero-inflation, we incorporate a mechanism that accounts for the excess number of zeroes (disconnected pairs) observed in empirical data. By performing an analysis on all the datasets from the Sociopatterns repository, we illustrate how zero-inflated models more accurately reflect the sparsity and heavy-tailed edge count distributions observed in empirical data. Our findings underscore that failing to account for these ubiquitous properties in real-world networks inadvertently leads to biased models that do not accurately represent complex systems and their dynamics.
