Table of Contents
Fetching ...

Discrete Bayesian Sample Inference for Graph Generation

Ole Petersen, Marcel Kollovieh, Marten Lienen, Stephan Günnemann

TL;DR

This work introduces GraphBSI, a novel one-shot graph generative model based on Bayesian Sample Inference (BSI), which demonstrates state-of-the-art performance on molecular and synthetic graph generation, outperforming existing one-shot graph generative models on the standard benchmarks Moses and GuacaMol.

Abstract

Generating graph-structured data is crucial in applications such as molecular generation, knowledge graphs, and network analysis. However, their discrete, unordered nature makes them difficult for traditional generative models, leading to the rise of discrete diffusion and flow matching models. In this work, we introduce GraphBSI, a novel one-shot graph generative model based on Bayesian Sample Inference (BSI). Instead of evolving samples directly, GraphBSI iteratively refines a belief over graphs in the continuous space of distribution parameters, naturally handling discrete structures. Further, we state BSI as a stochastic differential equation (SDE) and derive a noise-controlled family of SDEs that preserves the marginal distributions via an approximation of the score function. Our theoretical analysis further reveals the connection to Bayesian Flow Networks and Diffusion models. Finally, in our empirical evaluation, we demonstrate state-of-the-art performance on molecular and synthetic graph generation, outperforming existing one-shot graph generative models on the standard benchmarks Moses and GuacaMol.

Discrete Bayesian Sample Inference for Graph Generation

TL;DR

This work introduces GraphBSI, a novel one-shot graph generative model based on Bayesian Sample Inference (BSI), which demonstrates state-of-the-art performance on molecular and synthetic graph generation, outperforming existing one-shot graph generative models on the standard benchmarks Moses and GuacaMol.

Abstract

Generating graph-structured data is crucial in applications such as molecular generation, knowledge graphs, and network analysis. However, their discrete, unordered nature makes them difficult for traditional generative models, leading to the rise of discrete diffusion and flow matching models. In this work, we introduce GraphBSI, a novel one-shot graph generative model based on Bayesian Sample Inference (BSI). Instead of evolving samples directly, GraphBSI iteratively refines a belief over graphs in the continuous space of distribution parameters, naturally handling discrete structures. Further, we state BSI as a stochastic differential equation (SDE) and derive a noise-controlled family of SDEs that preserves the marginal distributions via an approximation of the score function. Our theoretical analysis further reveals the connection to Bayesian Flow Networks and Diffusion models. Finally, in our empirical evaluation, we demonstrate state-of-the-art performance on molecular and synthetic graph generation, outperforming existing one-shot graph generative models on the standard benchmarks Moses and GuacaMol.

Paper Structure

This paper contains 27 sections, 6 theorems, 44 equations, 5 figures, 8 tables, 6 algorithms.

Key Result

Theorem 1

Given a prior belief $p(\mathbf{x}\mid \mathbf{z})=\mathrm{Cat}(\mathbf{x}\mid \mathrm{softmax}(\mathbf{z}))$, after observing $\mathbf{y} \sim \mathcal{N}(\mathbf{y} \mid \mu = \mathbf{x}, \Sigma^2 = \alpha^{-1} \mathbf{I})$ at precision $\alpha$, the posterior belief is $p(\mathbf{x}\mid \mathbf{

Figures (5)

  • Figure 1: Illustration of GraphBSI's generative process. Nodes and edges are modeled as independent categorical variables. One edge-type is used to represent the non-existence of an edge. The latent variable $\mathbf{z}_t$ represents a distribution over graphs rather than a graph itself. The neural network $f_\theta$ smoothly steers this distribution from a random initial distribution $\mathbf{z}_0$ to a distribution concentrated on valid graphs $\mathbf{z}_1$, which is modeled as a Stochastic Differential Equation (SDE).
  • Figure 2: Trajectories of the SDE \ref{['theorem:generalized_sde']} for different values of $\gamma$ with three classes and fixed reconstruction $f_\theta(\mathbf{z}_t, t)=\hat{e}_2$. At $\gamma=0$, the sampler resembles a probability flow ODE as in flow matching. Increasing $\gamma$ leads to noisier trajectories. At $\gamma=1$, the original SDE in \ref{['theorem:sde']} is recovered, and increasing the noise further makes the trajectories even more volatile. The density function of the marginal distribution $p(\mathbf{x}\mid\mathbf{z}_t)$ (shown in the background) is identical for all $\gamma$.
  • Figure 3: Normalized metrics (zero mean, unit variance) vs. noise level $\gamma$ for different numbers of function evaluations (FE) and discretization schemes. Our custom Ornstein-Uhlenbeck discretization scheme is denoted as OU, while the standard Euler-Maruyama scheme is written as Euler. Some values for the Euler scheme are missing since the sampler becomes unstable if $\gamma \cdot \Delta t$ becomes too large (see \ref{['app:euler_maruyama_stability']}).
  • Figure 4: Performance change for changes in the non-uniform timestepping parameter $\rho$ in $t_i = (i/k)^\rho$ for $i=0,1,\dots,k$ compared to the uniform case $\rho=1$. $\rho<1$ results in a finer discretization at later timesteps, while $\rho>1$ corresponds to finer discretization at earlier steps.
  • Figure 5: Results on the QM9 dataset.

Theorems & Definitions (12)

  • Theorem 1
  • Theorem 2
  • Theorem 3
  • Theorem 4
  • Theorem 5
  • Theorem 6
  • proof
  • proof
  • proof
  • proof
  • ...and 2 more