Table of Contents
Fetching ...

Fast Unbiased Sampling of Networks with Given Expected Degrees and Strengths

Xuanchi Li, Xin Wang, Sadamori Kojaku

TL;DR

This paper tackles bias in the Chung-Lu model and the computational bottleneck of MaxEnt configuration models by introducing fast Miller-Hagberg–based sampling algorithms for the Undirected Binary Configuration Model (UBCM) and the Undirected Enhanced Configuration Model (UECM). It demonstrates dramatic speedups (10–1000×) over brute-force sampling across 103 networks while preserving degree (and strength) constraints, yielding unbiased ensembles that reflect the intended degree heterogeneity. The approach extends to bipartite, directed, and hypergraph representations, enabling principled statistical testing of network structure at scale. An open-source Python implementation provides practical means to adopt these rigorous MaxEnt models in large-network analyses and comparisons.

Abstract

The configuration model is a cornerstone of statistical assessment of network structure. While the Chung-Lu model is among the most widely used configuration models, it systematically oversamples edges between large-degree nodes, leading to inaccurate statistical conclusions. Although the maximum entropy principle offers unbiased configuration models, its high computational cost has hindered widespread adoption, making the Chung-Lu model an inaccurate yet persistently practical choice. Here, we propose fast and efficient sampling algorithms for the max-entropy-based models by adapting the Miller-Hagberg algorithm. Evaluation on 103 empirical networks demonstrates 10-1000 times speedup, making theoretically rigorous configuration models practical and contributing to a more accurate understanding of network structure.

Fast Unbiased Sampling of Networks with Given Expected Degrees and Strengths

TL;DR

This paper tackles bias in the Chung-Lu model and the computational bottleneck of MaxEnt configuration models by introducing fast Miller-Hagberg–based sampling algorithms for the Undirected Binary Configuration Model (UBCM) and the Undirected Enhanced Configuration Model (UECM). It demonstrates dramatic speedups (10–1000×) over brute-force sampling across 103 networks while preserving degree (and strength) constraints, yielding unbiased ensembles that reflect the intended degree heterogeneity. The approach extends to bipartite, directed, and hypergraph representations, enabling principled statistical testing of network structure at scale. An open-source Python implementation provides practical means to adopt these rigorous MaxEnt models in large-network analyses and comparisons.

Abstract

The configuration model is a cornerstone of statistical assessment of network structure. While the Chung-Lu model is among the most widely used configuration models, it systematically oversamples edges between large-degree nodes, leading to inaccurate statistical conclusions. Although the maximum entropy principle offers unbiased configuration models, its high computational cost has hindered widespread adoption, making the Chung-Lu model an inaccurate yet persistently practical choice. Here, we propose fast and efficient sampling algorithms for the max-entropy-based models by adapting the Miller-Hagberg algorithm. Evaluation on 103 empirical networks demonstrates 10-1000 times speedup, making theoretically rigorous configuration models practical and contributing to a more accurate understanding of network structure.

Paper Structure

This paper contains 12 sections, 4 equations, 4 figures, 1 algorithm.

Figures (4)

  • Figure 1: Structural bias in the Chung-Lu model. We use the Holme-Kim network of 5,000 nodes. A: The density of edges within the largest-degree nodes as a function of group size $\alpha$. B: The distribution of triangle counts in the sampled networks across 100 realizations.
  • Figure 2: Schematic illustration of the Miller-Hagberg (MH) algorithm for the Chung-Lu model. In the Chung-Lu model, the edge probabilities $p_{ij}$ are monotonically decreasing with respect to the node degrees. The MH algorithm exploits this structure by proposing a neighbor with probability $q \geq p_{ij}$ and accepting candidates with probability $p_{ij}/q$. Proposing the next candidate neighbor after skipping over $L$ nodes follows a geometric distribution $q(1-q)^{L-1}$. By sampling $L$ from the geometric distribution, the algorithm avoids having to evaluate all nodes.
  • Figure 3: Results for the unweighted networks. A: CPU Time as a function of the number $M$ of edges. B: The joint probability distribution of the reconstructed and empirical degree sequences. We show the results for three empirical networks as representative (Open Flight, ca-HepPh, and DBLP-cite). C: The density of edges within the group of the largest-degree nodes.
  • Figure 4: Results for the weighted networks. A: CPU Time as a function of $M$. B: The joint probability distribution of the reconstructed and empirical degree sequences, and that of the strength sequences. We show the results for three empirical networks as representative.