Table of Contents
Fetching ...

Clustering without geometry in sparse networks with independent edges

Alessio Catanzaro, Remco van der Hofstad, Diego Garlaschelli

Abstract

The coexistence of sparsity and clustering (non-vanishing average fraction of triangles per node) is one of the few structural features that, irrespective of finer details, are ubiquitously observed across large real-world networks. This fact calls for generic models producing sparse clustered graphs. Earlier results suggested that sparse random graphs with independent edges fail to reproduce clustering, unless edge probabilities are assumed to depend on underlying metric distances that, thanks to the triangle inequality, naturally favour triadic closure. This observation has opened a debate on whether clustering implies (latent) geometry in real-world networks. Alternatively, recent models of higher-order networks can replicate clustering by abandoning edge independence. In this paper, we mathematically prove, and numerically confirm, that a sparse random graph with independent edges, recently identified in the context of network renormalization as an invariant model under node aggregation, produces finite clustering without any geometric or higher-order constraint. The underlying mechanism is an infinite-mean node fitness, which also implies a power-law degree distribution. Further, as a novel phenomenon that we characterize rigorously, we observe the breakdown of self-averaging of various network properties. Therefore, as an alternative to geometry or higher-order dependencies, node aggregation invariance emerges as a basic route to realistic network properties.

Clustering without geometry in sparse networks with independent edges

Abstract

The coexistence of sparsity and clustering (non-vanishing average fraction of triangles per node) is one of the few structural features that, irrespective of finer details, are ubiquitously observed across large real-world networks. This fact calls for generic models producing sparse clustered graphs. Earlier results suggested that sparse random graphs with independent edges fail to reproduce clustering, unless edge probabilities are assumed to depend on underlying metric distances that, thanks to the triangle inequality, naturally favour triadic closure. This observation has opened a debate on whether clustering implies (latent) geometry in real-world networks. Alternatively, recent models of higher-order networks can replicate clustering by abandoning edge independence. In this paper, we mathematically prove, and numerically confirm, that a sparse random graph with independent edges, recently identified in the context of network renormalization as an invariant model under node aggregation, produces finite clustering without any geometric or higher-order constraint. The underlying mechanism is an infinite-mean node fitness, which also implies a power-law degree distribution. Further, as a novel phenomenon that we characterize rigorously, we observe the breakdown of self-averaging of various network properties. Therefore, as an alternative to geometry or higher-order dependencies, node aggregation invariance emerges as a basic route to realistic network properties.
Paper Structure (7 sections, 4 theorems, 90 equations, 10 figures)

This paper contains 7 sections, 4 theorems, 90 equations, 10 figures.

Key Result

Theorem 1

Let the vertex space of the network be $[n]=\{1, \ldots, n\}$. Let $(W_v)_{v\in[n]}$ be i.i.d. Pareto random variables, so that their density equals with $\alpha\in(0,1)$, so that $\mathop{\mathrm{\mathbb{E}}}\nolimits[W]=\infty.$ Conditionally on $(W_v)_{v\in[n]}$, edges are present independently, with the edge between $u,v\in [n]$ being present with probability Throughout, we let $(I_{u,v})_{1

Figures (10)

  • Figure 1: Clustering functions for different values of $n$ and $\alpha$ for the MSM model \ref{['eq:pizza']} with Pareto weights \ref{['eq:pareto']}. Blue circles: empirical clustering function \ref{['eq:cc_empirical']} (versus the reduced degree $a=k/\sqrt{n}$) computed on actual realized graphs sampled from the model (obtained by sampling the weights once, and sampling the graph once conditionally on the realized weights). Red curves: our analytical expression \ref{['eq:analytical_clustering']} for the annealed clustering function. Green curves: our asymptotic calculation \ref{['eq:CC_hub']} valid for diverging reduced degrees. We evaluate and plot all functions only for degrees larger than $1$, to avoid ambiguities in the definition of the clustering coefficient for $k<2$. From top to bottom: $n=10^2,10^3,10^4$. From left to right: $\alpha=0.3,0.5,0.7$.
  • Figure 2: Average local clustering coefficient $C$ and distance to $r_{0/1}$ versus network size $n$ (for $\alpha=0.5$). All results are obtained by sampling the weights 10 times independently for each value of $n$, sampling a single graph conditionally on each realization of the weights, computing the relevant quantities on that single graph, and finally calculating averages and error bars over the 10 realizations. The trends show the node-averaged local clustering coefficient ${C}$, both including (blue symbols) and excluding (purple symbols) nodes with degree $k<2$: note that the latter evolves smoothly towards 1 with shrinking error bars as $n$ increases, while the former fluctuates with non-vanishing error bars (as a result of non-self-averaging) and becomes progressively closer to $1-r_{0/1}$ over realizations (dashed blue line), which is also fluctuating. The difference between $1-r_{0/1}$ and ${C}$ computed including nodes with $k<2$ (green symbols) converges to zero with shrinking error bars.
  • Figure S1: Average local clustering coefficient $C$ and distance to $r_{0/1}$ versus network size $n$ (for $\alpha=0.3$). On the left, simulations are done by resampling weights every time an actual network is realized (sample of $10$), while on the left for each $n$ weights are extracted only once, and $10$ different adjacency matrices are realized from the same weights. On top we show the node-averaged local clustering coefficient ${C}$, both including (blue symbols) and excluding (purple symbols) nodes with degree $k=0,1$ (note that the latter evolves smoothly towards 1 with shrinking error bars as $n$ increases, while the former fluctuates with non-vanishing error bars, as a result of non-self-averaging). The dashed blue line is $1$ minus the average of $r_{0/1}$ over realizations. Finally, in green the difference between $1-r_{0/1}$ and ${C}$ computed including nodes with $k<2$ (notice the shrinking error bars). In the middle, the light blue line with error bars is the actual value of $r_0$, while the dashed darker one is the approximation we derive in \ref{['r0-approximation']}. At the bottom, the light red line with error bars is the actual value of $r_1$, while the dashed darker one is the approximation we derive in \ref{['r1-approximation']}.
  • Figure S2: Average local clustering coefficient $C$ and distance to $r_{0/1}$ versus network size $n$ (for $\alpha=0.5$). On the left, simulations are done by resampling weights every time an actual network is realized (sample of $10$), while on the left for each $n$ weights are extracted only once, and $10$ different adjacency matrices are realized from the same weights. On top we show the node-averaged local clustering coefficient ${C}$, both including (blue symbols) and excluding (purple symbols) nodes with degree $k=0,1$ (note that the latter evolves smoothly towards 1 with shrinking error bars as $n$ increases, while the former fluctuates with non-vanishing error bars, as a result of non-self-averaging). The dashed blue line is $1$ minus the average of $r_{0/1}$ over realizations. Finally, in green the difference between $1-r_{0/1}$ and ${C}$ computed including nodes with $k<2$ (notice the shrinking error bars). In the middle, the light blue line with error bars is the actual value of $r_0$, while the dashed darker one is the approximation we derive in \ref{['r0-approximation']}. At the bottom, the light red line with error bars is the actual value of $r_1$, while the dashed darker one is the approximation we derive in \ref{['r1-approximation']}.
  • Figure S3: Average local clustering coefficient $C$ and distance to $r_{0/1}$ versus network size $n$ (for $\alpha=0.7$). On the left, simulations are done by resampling weights every time an actual network is realized (sample of $10$), while on the left for each $n$ weights are extracted only once, and $10$ different adjacency matrices are realized from the same weights. On top we show the node-averaged local clustering coefficient ${C}$, both including (blue symbols) and excluding (purple symbols) nodes with degree $k=0,1$ (note that the latter evolves smoothly towards 1 with shrinking error bars as $n$ increases, while the former fluctuates with non-vanishing error bars, as a result of non-self-averaging). The dashed blue line is $1$ minus the average of $r_{0/1}$ over realizations. Finally, in green the difference between $1-r_{0/1}$ and ${C}$ computed including nodes with $k<2$ (notice the shrinking error bars). In the middle, the light blue line with error bars is the actual value of $r_0$, while the dashed darker one is the approximation we derive in \ref{['r0-approximation']}. At the bottom, the light red line with error bars is the actual value of $r_1$, while the dashed darker one is the approximation we derive in \ref{['r1-approximation']}.
  • ...and 5 more figures

Theorems & Definitions (8)

  • Theorem 1
  • proof
  • Lemma 1: $\bar{C}(a)$ is bounded by 1
  • proof
  • Lemma 2: $\bar{C}(a)$ is close to 1 for $a$ small
  • proof
  • Lemma 3: Large $a$ asymptotics of $\bar{C}(a)$
  • proof