Nested stochastic block model for simultaneously clustering networks and nodes
Nathaniel Josephs, Arash A. Amini, Marina Paez, Lizhen Lin
TL;DR
The paper tackles the problem of simultaneously clustering a collection of networks and the nodes within each network when node correspondence is unknown. It introduces NSBM, a hierarchical Bayesian approach that uses the nested Dirichlet process prior to couple between-network clustering with within-network community detection, allowing automatic learning of the number of network classes and community counts. NSBM accommodates heterogeneous communities and varying vertex sets across networks, and it provides four Gibbs-based posterior samplers to navigate the model’s two-level complexity. Through extensive simulations and real-data analyses, the authors demonstrate that NSBM achieves accurate joint clustering and exhibits robustness to heterogeneity and anonymity, offering a principled framework for network population inference without requiring node alignment.
Abstract
We introduce the nested stochastic block model (NSBM) to cluster a collection of networks while simultaneously detecting communities within each network. NSBM has several appealing features including the ability to work on unlabeled networks with potentially different node sets, the flexibility to model heterogeneous communities, and the means to automatically select the number of classes for the networks and the number of communities within each network. This is accomplished via a Bayesian model, with a novel application of the nested Dirichlet process (NDP) as a prior to jointly model the between-network and within-network clusters. The dependency introduced by the network data creates nontrivial challenges for the NDP, especially in the development of efficient samplers. For posterior inference, we propose several Markov chain Monte Carlo algorithms including a standard Gibbs sampler, a collapsed Gibbs sampler, and two blocked Gibbs samplers that ultimately return two levels of clustering labels from both within and across the networks. Extensive simulation studies are carried out which demonstrate that the model provides very accurate estimates of both levels of the clustering structure. We also apply our model to two social network datasets that cannot be analyzed using any previous method in the literature due to the anonymity of the nodes and the varying number of nodes in each network.
