Table of Contents
Fetching ...

Nested stochastic block model for simultaneously clustering networks and nodes

Nathaniel Josephs, Arash A. Amini, Marina Paez, Lizhen Lin

TL;DR

The paper tackles the problem of simultaneously clustering a collection of networks and the nodes within each network when node correspondence is unknown. It introduces NSBM, a hierarchical Bayesian approach that uses the nested Dirichlet process prior to couple between-network clustering with within-network community detection, allowing automatic learning of the number of network classes and community counts. NSBM accommodates heterogeneous communities and varying vertex sets across networks, and it provides four Gibbs-based posterior samplers to navigate the model’s two-level complexity. Through extensive simulations and real-data analyses, the authors demonstrate that NSBM achieves accurate joint clustering and exhibits robustness to heterogeneity and anonymity, offering a principled framework for network population inference without requiring node alignment.

Abstract

We introduce the nested stochastic block model (NSBM) to cluster a collection of networks while simultaneously detecting communities within each network. NSBM has several appealing features including the ability to work on unlabeled networks with potentially different node sets, the flexibility to model heterogeneous communities, and the means to automatically select the number of classes for the networks and the number of communities within each network. This is accomplished via a Bayesian model, with a novel application of the nested Dirichlet process (NDP) as a prior to jointly model the between-network and within-network clusters. The dependency introduced by the network data creates nontrivial challenges for the NDP, especially in the development of efficient samplers. For posterior inference, we propose several Markov chain Monte Carlo algorithms including a standard Gibbs sampler, a collapsed Gibbs sampler, and two blocked Gibbs samplers that ultimately return two levels of clustering labels from both within and across the networks. Extensive simulation studies are carried out which demonstrate that the model provides very accurate estimates of both levels of the clustering structure. We also apply our model to two social network datasets that cannot be analyzed using any previous method in the literature due to the anonymity of the nodes and the varying number of nodes in each network.

Nested stochastic block model for simultaneously clustering networks and nodes

TL;DR

The paper tackles the problem of simultaneously clustering a collection of networks and the nodes within each network when node correspondence is unknown. It introduces NSBM, a hierarchical Bayesian approach that uses the nested Dirichlet process prior to couple between-network clustering with within-network community detection, allowing automatic learning of the number of network classes and community counts. NSBM accommodates heterogeneous communities and varying vertex sets across networks, and it provides four Gibbs-based posterior samplers to navigate the model’s two-level complexity. Through extensive simulations and real-data analyses, the authors demonstrate that NSBM achieves accurate joint clustering and exhibits robustness to heterogeneity and anonymity, offering a principled framework for network population inference without requiring node alignment.

Abstract

We introduce the nested stochastic block model (NSBM) to cluster a collection of networks while simultaneously detecting communities within each network. NSBM has several appealing features including the ability to work on unlabeled networks with potentially different node sets, the flexibility to model heterogeneous communities, and the means to automatically select the number of classes for the networks and the number of communities within each network. This is accomplished via a Bayesian model, with a novel application of the nested Dirichlet process (NDP) as a prior to jointly model the between-network and within-network clusters. The dependency introduced by the network data creates nontrivial challenges for the NDP, especially in the development of efficient samplers. For posterior inference, we propose several Markov chain Monte Carlo algorithms including a standard Gibbs sampler, a collapsed Gibbs sampler, and two blocked Gibbs samplers that ultimately return two levels of clustering labels from both within and across the networks. Extensive simulation studies are carried out which demonstrate that the model provides very accurate estimates of both levels of the clustering structure. We also apply our model to two social network datasets that cannot be analyzed using any previous method in the literature due to the anonymity of the nodes and the varying number of nodes in each network.
Paper Structure (30 sections, 40 equations, 5 figures, 3 tables)

This paper contains 30 sections, 40 equations, 5 figures, 3 tables.

Figures (5)

  • Figure 4.1: Cluster performance of our four Gibbs algorithms. $\bm z$-NMI (left) measures how well we are clustering the network objects and $\bm \xi$-NMI (right) measures how well we are performing community detection on the nodes. The bands are 50% quantile regions based on 100 experiments.
  • Figure 4.2: Varying $n$ with $\lambda = n/10$ and $\gamma = 0.4$.
  • Figure 4.3: Varying $\tau$ for fixed $\gamma = 0.2$ and $\lambda = 25$.
  • Figure 4.4: Simulated multilayer personality-friendship networks.
  • Figure A.1: Efficient computation of $\prod_{x\le y} f(x,y)$. Using symmetry of $f$, we can "almost" just multiply elements over two rows $r$ and ${r'}$. Only one element is double-counted which can be divided by.