Hybrid of node and link communities for graphon estimation
Arthur Verdeyme, Sofia C. Olhede
TL;DR
The paper tackles nonparametric graphon estimation for networks by introducing the stochastic shape model (SSM), which hybrids node- and edge-based community concepts and applies a smoothing step to produce a parsimonious, multiscale graphon representation. It develops a two-step estimator—block-based initialization followed by density-based smoothing—tuned by model selection via BIC, and provides minimax-rate results showing rate-optimal convergence under both SSM and Hölder graphon classes. Theoretical bounds are complemented by empirical evidence on synthetic and real networks, demonstrating strong predictive performance with substantially fewer parameters than traditional block models. The approach yields interpretable, multiscale summaries of network structure and offers a bridge between node-centric and edge-centric viewpoints, enabling a robust framework for link communities and hierarchical patterns.
Abstract
Networks serve as a tool used to examine the large-scale connectivity patterns in complex systems. Modelling their generative mechanism nonparametrically is often based on step-functions, such as the stochastic block models. These models are capable of addressing two prominent topics in network science: link prediction and community detection. However, such methods often have a resolution limit, making it difficult to separate small-scale structures from noise. To arrive at a smoother representation of the network's generative mechanism, we explicitly trade variance for bias by smoothing blocks of edges based on stochastic equivalence. As such, we propose a different estimation method using a new model, which we call the stochastic shape model. Typically, analysis methods are based on modelling node or link communities. In contrast, we take a hybrid approach, bridging the two notions of community. Consequently, we obtain a more parsimonious representation, enabling a more interpretable and multiscale summary of the network structure. By considering multiple resolutions, we trade bias and variance to ensure that our estimator is rate-optimal. We also examine the performance of our model through simulations and applications to real network data.
