Table of Contents
Fetching ...

Hybrid of node and link communities for graphon estimation

Arthur Verdeyme, Sofia C. Olhede

TL;DR

The paper tackles nonparametric graphon estimation for networks by introducing the stochastic shape model (SSM), which hybrids node- and edge-based community concepts and applies a smoothing step to produce a parsimonious, multiscale graphon representation. It develops a two-step estimator—block-based initialization followed by density-based smoothing—tuned by model selection via BIC, and provides minimax-rate results showing rate-optimal convergence under both SSM and Hölder graphon classes. Theoretical bounds are complemented by empirical evidence on synthetic and real networks, demonstrating strong predictive performance with substantially fewer parameters than traditional block models. The approach yields interpretable, multiscale summaries of network structure and offers a bridge between node-centric and edge-centric viewpoints, enabling a robust framework for link communities and hierarchical patterns.

Abstract

Networks serve as a tool used to examine the large-scale connectivity patterns in complex systems. Modelling their generative mechanism nonparametrically is often based on step-functions, such as the stochastic block models. These models are capable of addressing two prominent topics in network science: link prediction and community detection. However, such methods often have a resolution limit, making it difficult to separate small-scale structures from noise. To arrive at a smoother representation of the network's generative mechanism, we explicitly trade variance for bias by smoothing blocks of edges based on stochastic equivalence. As such, we propose a different estimation method using a new model, which we call the stochastic shape model. Typically, analysis methods are based on modelling node or link communities. In contrast, we take a hybrid approach, bridging the two notions of community. Consequently, we obtain a more parsimonious representation, enabling a more interpretable and multiscale summary of the network structure. By considering multiple resolutions, we trade bias and variance to ensure that our estimator is rate-optimal. We also examine the performance of our model through simulations and applications to real network data.

Hybrid of node and link communities for graphon estimation

TL;DR

The paper tackles nonparametric graphon estimation for networks by introducing the stochastic shape model (SSM), which hybrids node- and edge-based community concepts and applies a smoothing step to produce a parsimonious, multiscale graphon representation. It develops a two-step estimator—block-based initialization followed by density-based smoothing—tuned by model selection via BIC, and provides minimax-rate results showing rate-optimal convergence under both SSM and Hölder graphon classes. Theoretical bounds are complemented by empirical evidence on synthetic and real networks, demonstrating strong predictive performance with substantially fewer parameters than traditional block models. The approach yields interpretable, multiscale summaries of network structure and offers a bridge between node-centric and edge-centric viewpoints, enabling a robust framework for link communities and hierarchical patterns.

Abstract

Networks serve as a tool used to examine the large-scale connectivity patterns in complex systems. Modelling their generative mechanism nonparametrically is often based on step-functions, such as the stochastic block models. These models are capable of addressing two prominent topics in network science: link prediction and community detection. However, such methods often have a resolution limit, making it difficult to separate small-scale structures from noise. To arrive at a smoother representation of the network's generative mechanism, we explicitly trade variance for bias by smoothing blocks of edges based on stochastic equivalence. As such, we propose a different estimation method using a new model, which we call the stochastic shape model. Typically, analysis methods are based on modelling node or link communities. In contrast, we take a hybrid approach, bridging the two notions of community. Consequently, we obtain a more parsimonious representation, enabling a more interpretable and multiscale summary of the network structure. By considering multiple resolutions, we trade bias and variance to ensure that our estimator is rate-optimal. We also examine the performance of our model through simulations and applications to real network data.
Paper Structure (19 sections, 14 theorems, 85 equations, 6 figures)

This paper contains 19 sections, 14 theorems, 85 equations, 6 figures.

Key Result

Theorem 2.1

A random array $\left\{A_{i j}\right\}$ is jointly exchangeable if and only if it can be represented as follows: There is a random function $f:[0,1]^2 \rightarrow [0,1]$ such that where $\left(\xi_i\right)_{i \in \mathbb{N}}$ is a sequence of i.i.d $U[0,1]$ random variables, which are independent of $f$.

Figures (6)

  • Figure 1: Illustration of the shape membership operator, $w = u\circ z^2$.
  • Figure 2: Comparative analysis of estimators at number of shapes and model selection based on Bayesian Information Criterion (BIC). Panel (i) presents some step-function approximations with a decreasing number of shapes from left to right (a-d). Panel (ii) is a Sankey diagram illustrating the smoothing procedure and shape's probability value, where the width of the lines indicates the number of blocks smoothed together. Panel (iii) depicts the BIC values for the corresponding number of shapes ($S$), with the lowest BIC identified at $S=5$, indicating the most efficient model according to the BIC (in red).
  • Figure 3: Performance comparison, between our method, USVT chatterjee2015matrix and SAS chan2014consistent over different graphon realization, linewise. Left is the functional representation of the graphon. Middle is a log-log plot of the mean squared error. Right is a plot of the area under the ROC-curve.
  • Figure 4: Plots on the left represent the graphons $f_i$ for $i = 1,2,3$ from top to bottom. Middle plots represent the ratio of the number of parameters (RP) between our estimator, using BIC, and the estimator from olhede2014network. Similarly, plots on the right represent the AUC ratio (RAUC) between the two methods. All averaged over 10 Monte Carlo simulations.
  • Figure 5: Adjacency matrix of the political weblogs dataset adamic2005political when permuting the node labels according, from left to right, to the identity map (no permutation), to degrees ($\sigma$ permutation), to political party ($\phi$ permutation) and degrees based on interactions to the opposite party ($\pi$ permutation).
  • ...and 1 more figures

Theorems & Definitions (23)

  • Theorem 2.1: Aldous-Hoover
  • Definition 2.1: Stochastic Shape Model ($SSM$)
  • Definition 2.2: $(s,k)$-Stochastic Shape Model ($SSM(s,k)$)
  • Theorem 2.2
  • Theorem 2.3
  • Proposition 3.1
  • Theorem 3.1
  • Theorem 3.2
  • Lemma 3.1
  • Theorem 3.3
  • ...and 13 more