SteinGen: Generating Fidelitous and Diverse Graph Samples
Gesine Reinert, Wenkai Xu
TL;DR
SteinGen tackles graph generation from a single observed network by marrying Stein's method with Glauber dynamics in the ERGM setting. It estimates and re-estimates a conditional edge-probability model to drive sampling, yielding both high fidelity to the target distribution and substantial diversity among generated graphs. The authors provide consistency, diversity, and mixing-time guarantees, and validate the approach empirically against parametric and implicit competitors on synthetic ERGMs and real networks. The method avoids parameter estimation pitfalls, scales to different contexts, and can be extended to multiple graphs and nonparametric statistics, offering a principled pathway for faithful, diverse synthetic network generation.
Abstract
Generating graphs that preserve characteristic structures while promoting sample diversity can be challenging, especially when the number of graph observations is small. Here, we tackle the problem of graph generation from only one observed graph. The classical approach of graph generation from parametric models relies on the estimation of parameters, which can be inconsistent or expensive to compute due to intractable normalisation constants. Generative modelling based on machine learning techniques to generate high-quality graph samples avoids parameter estimation but usually requires abundant training samples. Our proposed generating procedure, SteinGen, which is phrased in the setting of graphs as realisations of exponential random graph models, combines ideas from Stein's method and MCMC by employing Markovian dynamics which are based on a Stein operator for the target model. SteinGen uses the Glauber dynamics associated with an estimated Stein operator to generate a sample, and re-estimates the Stein operator from the sample after every sampling step. We show that on a class of exponential random graph models this novel "estimation and re-estimation" generation strategy yields high distributional similarity (high fidelity) to the original data, combined with high sample diversity.
