Table of Contents
Fetching ...

From your Block to our Block: How to Find Shared Structure between Stochastic Block Models over Multiple Graphs

Iiro Kumpulainen, Sebastian Dalleiger, Jilles Vreeken, Nikolaj Tatti

TL;DR

We introduce the Shared Stochastic Block Model (SSBM) to identify blocks shared across $n$ graphs that may be unaligned or differ in size. Fitting SSBM is NP-hard, so we propose practical pipelines: (i) joint MCMC with simulated annealing to jointly infer block assignments and shared blocks, and (ii) a two-stage approach that learns per-graph SBMs and then uses ILP or Greedy optimization to select the $s$ shared blocks; the latter can be scaled via a fast Greedy method. Empirical results on synthetic data show strong recovery of shared blocks and favorable BIC scores, while real-world studies on ADHD brain networks and large Wikipedia networks demonstrate interpretability and scalability, with linear-time behavior in the number of edges for the core inference step. The work provides scalable tools for discovering common connectivity structure across multiple graphs and lays groundwork for extensions to more complex SBM variants and sharing schemes.

Abstract

Stochastic Block Models (SBMs) are a popular approach to modeling single real-world graphs. The key idea of SBMs is to partition the vertices of the graph into blocks with similar edge densities within, as well as between different blocks. However, what if we are given not one but multiple graphs that are unaligned and of different sizes? How can we find out if these graphs share blocks with similar connectivity structures? In this paper, we propose the shared stochastic block modeling (SSBM) problem, in which we model $n$ graphs using SBMs that share parameters of $s$ blocks. We show that fitting an SSBM is NP-hard, and consider two approaches to fit good models in practice. In the first, we directly maximize the likelihood of the shared model using a Markov chain Monte Carlo algorithm. In the second, we first fit an SBM for each graph and then select which blocks to share. We propose an integer linear program to find the optimal shared blocks and to scale to large numbers of blocks, we propose a fast greedy algorithm. Through extensive empirical evaluation on synthetic and real-world data, we show that our methods work well in practice.

From your Block to our Block: How to Find Shared Structure between Stochastic Block Models over Multiple Graphs

TL;DR

We introduce the Shared Stochastic Block Model (SSBM) to identify blocks shared across graphs that may be unaligned or differ in size. Fitting SSBM is NP-hard, so we propose practical pipelines: (i) joint MCMC with simulated annealing to jointly infer block assignments and shared blocks, and (ii) a two-stage approach that learns per-graph SBMs and then uses ILP or Greedy optimization to select the shared blocks; the latter can be scaled via a fast Greedy method. Empirical results on synthetic data show strong recovery of shared blocks and favorable BIC scores, while real-world studies on ADHD brain networks and large Wikipedia networks demonstrate interpretability and scalability, with linear-time behavior in the number of edges for the core inference step. The work provides scalable tools for discovering common connectivity structure across multiple graphs and lays groundwork for extensions to more complex SBM variants and sharing schemes.

Abstract

Stochastic Block Models (SBMs) are a popular approach to modeling single real-world graphs. The key idea of SBMs is to partition the vertices of the graph into blocks with similar edge densities within, as well as between different blocks. However, what if we are given not one but multiple graphs that are unaligned and of different sizes? How can we find out if these graphs share blocks with similar connectivity structures? In this paper, we propose the shared stochastic block modeling (SSBM) problem, in which we model graphs using SBMs that share parameters of blocks. We show that fitting an SSBM is NP-hard, and consider two approaches to fit good models in practice. In the first, we directly maximize the likelihood of the shared model using a Markov chain Monte Carlo algorithm. In the second, we first fit an SBM for each graph and then select which blocks to share. We propose an integer linear program to find the optimal shared blocks and to scale to large numbers of blocks, we propose a fast greedy algorithm. Through extensive empirical evaluation on synthetic and real-world data, we show that our methods work well in practice.

Paper Structure

This paper contains 14 sections, 1 theorem, 23 equations, 7 figures, 1 algorithm.

Key Result

Theorem 1

ssbm and ssbm-fixed are NP-hard, even for $n = 2$. Moreover, unless $\textbf{P} = \textbf{NP}$, there is no polynomial-time algorithm that always produces a solution for ssbm or ssbm-fixed with a log-likelihood $\mathit{LLH}$ such that $\mathit{LLH} \geq \alpha \mathit{OPT}$, where $\mathit{OPT}$ i

Figures (7)

  • Figure 1: Comparison of ARI scores for the partitions of vertices into shared or non-shared blocks by different algorithms for selecting shared blocks, with increasing levels of random noise in the input block assignment.
  • Figure 2: (Left): Average ARI scores measuring the similarity between the inferred partitions of vertices into blocks with the ground truth partitions. (Right): ARI scores between ground truth and inferred partitions of vertices into shared or non-shared blocks for different algorithms for inferring the block assignments paired with different methods for choosing the shared blocks. The Firsts method represents choosing the first $s$ blocks in each graph to be shared, which is effectively random for Single, Multilevel, and ML+Single.
  • Figure 3: Decrease in BIC (higher is better) and in log-likelihood (lower is better) for Multilevel (ML), ML+Single (ML+Si), and ML+Shared (ML+Sh) when using ILP shared blocks compared to not using any shared blocks.
  • Figure 4: BIC scores for different numbers of shared blocks for ML+Shared with ILP. The BIC is lowest at 3 shared blocks matching the ground truth value used for generating the graphs.
  • Figure 5: Average running times of different algorithms for optimizing which blocks to share as a function of the number of graphs.
  • ...and 2 more figures

Theorems & Definitions (2)

  • Theorem 1
  • proof