Table of Contents
Fetching ...

Deterministic construction of typical networks in network models

Narayan G. Sabhahit, Moritz Laber, Harrison Hartle, Jasper van der Kolk, Samuel V. Scarpino, Brennan Klein, Dmitri Krioukov

TL;DR

The paper defines a rigorous notion of the most typical network within grand canonical (and hypercanonical) ensembles and develops a deterministic, scalable construction that converges to this state in the thermodynamic limit. It introduces a systematic derandomization framework for both fixed-edge (grand canonical) and hidden-variable (hypercanonical) models, and demonstrates convergence using random hyperbolic graphs as a case study. Empirical results show real-world networks often resemble the most typical DHG under the inferred parameters, suggesting a practical pathway for reproducible benchmarking and model selection. The work also highlights the broader applicability of deterministic state construction and the need to further understand the properties intrinsically tied to the most typical state.

Abstract

It is often desirable to assess how well a given dataset is described by a given model. In network science, for instance, one often wants to say that a given real-world network appears to come from a particular network model. In statistical physics, the corresponding problem is about how typical a given state, representing real-world data, is in a particular statistical ensemble. One way to address this problem is to measure the distance between the data and the most typical state in the ensemble. Here, we identify the conditions that allow us to define this most typical state. These conditions hold in a wide class of grand canonical ensembles and their random mixtures. Our main contribution is a deterministic construction of a state that converges to this most typical state in the thermodynamic limit. This construction involves rounds of derandomization procedures, some of which deal with derandomizing point processes, an uncharted territory. We illustrate the construction on one particular network model, deterministic hyperbolic graphs, and its application to real-world networks, many of which we find are close to the most typical network in the model. While our main focus is on network models, our results are very general and apply to any grand canonical ensembles and their random mixtures satisfying certain niceness requirements.

Deterministic construction of typical networks in network models

TL;DR

The paper defines a rigorous notion of the most typical network within grand canonical (and hypercanonical) ensembles and develops a deterministic, scalable construction that converges to this state in the thermodynamic limit. It introduces a systematic derandomization framework for both fixed-edge (grand canonical) and hidden-variable (hypercanonical) models, and demonstrates convergence using random hyperbolic graphs as a case study. Empirical results show real-world networks often resemble the most typical DHG under the inferred parameters, suggesting a practical pathway for reproducible benchmarking and model selection. The work also highlights the broader applicability of deterministic state construction and the need to further understand the properties intrinsically tied to the most typical state.

Abstract

It is often desirable to assess how well a given dataset is described by a given model. In network science, for instance, one often wants to say that a given real-world network appears to come from a particular network model. In statistical physics, the corresponding problem is about how typical a given state, representing real-world data, is in a particular statistical ensemble. One way to address this problem is to measure the distance between the data and the most typical state in the ensemble. Here, we identify the conditions that allow us to define this most typical state. These conditions hold in a wide class of grand canonical ensembles and their random mixtures. Our main contribution is a deterministic construction of a state that converges to this most typical state in the thermodynamic limit. This construction involves rounds of derandomization procedures, some of which deal with derandomizing point processes, an uncharted territory. We illustrate the construction on one particular network model, deterministic hyperbolic graphs, and its application to real-world networks, many of which we find are close to the most typical network in the model. While our main focus is on network models, our results are very general and apply to any grand canonical ensembles and their random mixtures satisfying certain niceness requirements.

Paper Structure

This paper contains 28 sections, 68 equations, 22 figures, 1 table.

Figures (22)

  • Figure 1: Network typicality definition. In any grand canonical model of networks, Eq. \ref{['eq:canonical']}, the distribution $\mathbb{P}(m,\varepsilon)$ of the number of links $m(G)$ and energy $\varepsilon(G)$ of network $G$ is sharply peaked. A typical set ${\mathcal{G}}_{T}$ is the set of networks whose values of $m$ and $\varepsilon$ are within the $(\delta_m,\delta_\varepsilon)$-neighborhood around their expected values $(\bar{m},\bar{\varepsilon})$, Eq. \ref{['eq:network typicality']}, while the most typical network $G^{\star} = \text{argmin}_{G\in\mathcal{G}}\left(|m(G) - \bar{m}| + |\varepsilon(G) - \bar{\varepsilon}|\right)$ is the network which is closest to the top of the peak. This network is unique with high probability if energy $\varepsilon(G)$ is a real-valued function.
  • Figure 2: Deterministic construction of typical networks.(a) Grand canonical derandomization. The connection probability matrix $\mathbf{p}=\{p_{ij}\}_{i<j}$ is first flattened into an interval of length $\bar{m}=\sum_{i<j}p_{ij}$ composed of subintervals of lengths $p_{ij}$ as shown in the figure. The whole interval is then overlaid with an equally spaced one-dimensional lattice of the same length and with $m^\star=\lfloor\bar{m}\rceil$ internal vertices. These vertices hit $m^\star$ subintervals $p_{ij}$. The derandomized network $G_\mathbf{p}$ consists of those links $(i,j)$ whose subintervals are hit. (b) Hypercanonical derandomization. To derandomize the random sprinkling of $n$ points on the unit interval, the distances between consecutive points are first set to $d_i=-(1/n)\ln(1-u_i)$ using the inverse CDF method devroye1986_NonUniformRandomVariate applied to the inverse CDF of the exponential distribution of rate $n$ evaluated at the regular lattice $u_i=(i-1/2)/n$, $i\in[n]$. These distances are then shuffled using a pseudorandom permutation function $\sigma$ acting on indices $i$, so the coordinate of point $i$ is set to $x_i=x_{i-1}+d_{\sigma(i)}$, where $x_0=0$. In higher dimensions $d$, the same permutation function is used several times with different seeds---$\sigma_1, \sigma_2$ in $d=2$---to shuffle the distances $d_i$ between consecutive coordinates in different dimensions: $x_i=x_{i-1}+d_{\sigma_1(i)}$, $y_i=y_{i-1}+d_{\sigma_2(i)}$. It is used then again---$\sigma_3, \sigma_4$---to destroy the correlations between the $(x,y)$-coordinates of the same point: $(x_i,y_i)=(x_{\sigma_3(i)},y_{\sigma_4(i)})$. In this paper, permutations $\sigma_s$, $s\in[2d]$, are defined via the multiplicative congruential generator (MCG) knuth1997_taocp2: $\sigma_s$ is the permutation that sorts the pseudorandom MCG numbers $\left\{b_k\right\}_{k=n(s-1)+1}^{ns}$ in the increasing order, where $b_{k+1}=ab_k\mod c$, $a=7^5$, $b_0=42$, and $c=2^{31}-1$.
  • Figure 3: Self-averaging and convergence.(a,b) The empirical distributions of the rescaled number of links $m(G)$ and energy $\varepsilon(G)$ of the random hyperbolic graphs (RHGs) $G$ (Appendix \ref{['sec:RHG']}) are shown for increasing graph sizes $n$, and parameter values $\bar{\kappa} = 10$, $\beta = 4$, and $\gamma = 3.7$. The insets show the empirical coefficients of variation $C_{V}(n)$ (colored circles) of the distributions of $m$ and $\varepsilon$ for different values of $n$ and $\gamma$, versus the analytical predictions (black dashed lines) in Eq. \ref{['eq:scaling of cv']}. (c,d) The values of the rescaled number of links $m(G)$ and energy $\varepsilon(G)$ of the deterministic hyperbolic graphs (DHGs) $G$ (Appendix \ref{['sec:DHG']}) are shown for increasing graph sizes $n$, and parameter values $\bar{\kappa} = 10$, $\beta = 2.5$, and $\gamma = 2.7$. The black dashed lines are the theoretical expected values. The insets show these properties as functions of the inverse temperature $\beta$ for both DHGs and RHGs with the same parameter values and $n = 10^{4}$. The green squares are the RHG average values, and the shades are the 95 percentiles computed over $10^{3}$ RHG realizations. (e,f) The degree distribution and the average local clustering coefficient of DHGs and RHGs with the same parameter values as in (c,d).
  • Figure 4: Typicality of real-world networks. The figure displays a collection of network properties---degree distribution $\mathbb{P}(k)$ (panels (a), (i))), average nearest neighbor degree $k_\mathrm{nn}(k)$ of $k$-degree nodes (panels (b), (j)), average local clustering $c(k)$ of $k$-degree nodes (panels (c),(k)), distribution $\mathbb{P}(d)$ of shortest-path distances $d$ (panels (e),(m)), average betweenness $b(k)$ of $k$-degree nodes (panels (f),(n)), distribution $\mathbb{P}(n_\mathrm{cn})$ of the number of common neighbors $n_\mathrm{cn}$ of pairs of connected nodes, a.k.a. edge multiplicity zlatic2012_networksarbitraryedge (panels (g),(o)), of two real-world networks (blue squares) and their deterministic hyperbolic counterparts (orange circles). The real-world networks are the Internet at the autonomous system level (the upper two rows) and the Cora citation network (the lower two rows), documented as internet_as and cora in Appendix \ref{['sec:empirical_methods']}, which also details their visualizations in panels (d),(h),(l),(p).
  • Figure 5: Poisson point process versus its derandomization. (a) A realization of the PPP of intensity $n=10^{3}$ on the unit square $[0,1]^{2}$. (b) The deterministic sprinkling process from Sec. \ref{['sec: hypercanonical models']} with $n=10^{3}$. (c,d) The PPP nearest neighbor distance distributions in Eq. \ref{['eq: nearest neighbor dd PPP']}, versus the empirical nearest neighbor distance distribution in the deterministic sprinkling process with $n = 5\times10^{3}$ on the unit interval and unit square, respectively. (e,f) The PPP distance distributions in Eq. \ref{['eq: distance distribution pdfs']}, versus the empirical distance distribution in the deterministic sprinkling with $n=5 \times 10^{3}$ on the unit interval and unit square, respectively.
  • ...and 17 more figures