From your Block to our Block: How to Find Shared Structure between Stochastic Block Models over Multiple Graphs
Iiro Kumpulainen, Sebastian Dalleiger, Jilles Vreeken, Nikolaj Tatti
TL;DR
We introduce the Shared Stochastic Block Model (SSBM) to identify blocks shared across $n$ graphs that may be unaligned or differ in size. Fitting SSBM is NP-hard, so we propose practical pipelines: (i) joint MCMC with simulated annealing to jointly infer block assignments and shared blocks, and (ii) a two-stage approach that learns per-graph SBMs and then uses ILP or Greedy optimization to select the $s$ shared blocks; the latter can be scaled via a fast Greedy method. Empirical results on synthetic data show strong recovery of shared blocks and favorable BIC scores, while real-world studies on ADHD brain networks and large Wikipedia networks demonstrate interpretability and scalability, with linear-time behavior in the number of edges for the core inference step. The work provides scalable tools for discovering common connectivity structure across multiple graphs and lays groundwork for extensions to more complex SBM variants and sharing schemes.
Abstract
Stochastic Block Models (SBMs) are a popular approach to modeling single real-world graphs. The key idea of SBMs is to partition the vertices of the graph into blocks with similar edge densities within, as well as between different blocks. However, what if we are given not one but multiple graphs that are unaligned and of different sizes? How can we find out if these graphs share blocks with similar connectivity structures? In this paper, we propose the shared stochastic block modeling (SSBM) problem, in which we model $n$ graphs using SBMs that share parameters of $s$ blocks. We show that fitting an SSBM is NP-hard, and consider two approaches to fit good models in practice. In the first, we directly maximize the likelihood of the shared model using a Markov chain Monte Carlo algorithm. In the second, we first fit an SBM for each graph and then select which blocks to share. We propose an integer linear program to find the optimal shared blocks and to scale to large numbers of blocks, we propose a fast greedy algorithm. Through extensive empirical evaluation on synthetic and real-world data, we show that our methods work well in practice.
