SteinGen: Generating Fidelitous and Diverse Graph Samples

Gesine Reinert; Wenkai Xu

SteinGen: Generating Fidelitous and Diverse Graph Samples

Gesine Reinert, Wenkai Xu

TL;DR

SteinGen tackles graph generation from a single observed network by marrying Stein's method with Glauber dynamics in the ERGM setting. It estimates and re-estimates a conditional edge-probability model to drive sampling, yielding both high fidelity to the target distribution and substantial diversity among generated graphs. The authors provide consistency, diversity, and mixing-time guarantees, and validate the approach empirically against parametric and implicit competitors on synthetic ERGMs and real networks. The method avoids parameter estimation pitfalls, scales to different contexts, and can be extended to multiple graphs and nonparametric statistics, offering a principled pathway for faithful, diverse synthetic network generation.

Abstract

Generating graphs that preserve characteristic structures while promoting sample diversity can be challenging, especially when the number of graph observations is small. Here, we tackle the problem of graph generation from only one observed graph. The classical approach of graph generation from parametric models relies on the estimation of parameters, which can be inconsistent or expensive to compute due to intractable normalisation constants. Generative modelling based on machine learning techniques to generate high-quality graph samples avoids parameter estimation but usually requires abundant training samples. Our proposed generating procedure, SteinGen, which is phrased in the setting of graphs as realisations of exponential random graph models, combines ideas from Stein's method and MCMC by employing Markovian dynamics which are based on a Stein operator for the target model. SteinGen uses the Glauber dynamics associated with an estimated Stein operator to generate a sample, and re-estimates the Stein operator from the sample after every sampling step. We show that on a class of exponential random graph models this novel "estimation and re-estimation" generation strategy yields high distributional similarity (high fidelity) to the original data, combined with high sample diversity.

SteinGen: Generating Fidelitous and Diverse Graph Samples

TL;DR

Abstract

Paper Structure (34 sections, 6 theorems, 30 equations, 12 figures, 10 tables, 2 algorithms)

This paper contains 34 sections, 6 theorems, 30 equations, 12 figures, 10 tables, 2 algorithms.

Introduction
Assessing the quality and diversity of graph samples
Notation.
Exponential random graphs
Glauber dynamics and Stein operators for ERGMs
The graph kernel Stein statistic gKSS
Beyond ERGMs
: generating fidelitous graph samples with diversity
Theoretical analysis
Consistency of the estimation
Diversity guarantee
Mixing time considerations
Stability of
Measuring sample fidelity via total variation distance
Experimental results
...and 19 more sections

Key Result

Lemma 2

Each operator given in eq:stein_componentsatisfies the Stein identity; for each $s \in [N],$

Figures (12)

Figure 1: The procedure: $x_0$ is the input network; in step $k$ we pick a vertex pair uniformly at random and re-sample its edge indicator from the (re-)estimated conditional probability $\hat{q}^{(k)},$ given the current graph sample $x_{k-1}$ but excluding the picked vertex pair, to generate the next graph sample $x_{k}$. Changes in the intermediate steps are highlighted: the thicker red solid line in $x_1, x_3, x_{19}, x_{20}$ denotes that an edge is added, the green dashed line in $x_2$ indicates that an edge is removed. Only samples are shown which differ from the previous sample. The generated sample $x_{20}$ is visually different from the input graph $x_0$.
Figure 2: Hamming distance between generated samples and the initial synthetic network for a networks on ${n=}50$ vertices; average and standard deviation of $m=$50 trials. In E2S, $2a^\ast(1-a^\ast)=0.263$; in E2ST, $2a^\ast(1-a^\ast)=0.248$; in ET, $2a^\ast(1-a^\ast)=0.217$.
Figure 3: Hamming distance versus TV distance of degree using generated samples; $r$ is the number of steps in . The estimated error is estimated from simulations, the error bound is the bound $(\sqrt{\pi n})^{-1}$ from \ref{['eq:dtvbound']}.
Figure 4: Hamming distance versus $1-TV$ Distance of degree for the teenager network; $r$ is the number of steps in , the blue line is the error bound $(n \pi)^{-\frac{1}{2}}$ from \ref{['eq:dtvbound']}.
Figure 5: Estimated parameters $\beta_1$, $\beta_2$ for $n=50$. The plot shows considerable variability in the parameter estimation.
...and 7 more figures

Theorems & Definitions (8)

Definition 1: Definition 1.5 in reinert2019approximating
Lemma 2
Proposition 3: xu2022agrasstarxivProposition A.4
Remark 4
Proposition 5
Proposition 6
Proposition 7
Theorem 8: Theorem 2.1 in reinert2019approximating

SteinGen: Generating Fidelitous and Diverse Graph Samples

TL;DR

Abstract

SteinGen: Generating Fidelitous and Diverse Graph Samples

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (12)

Theorems & Definitions (8)