Table of Contents
Fetching ...

Advancing Graph Generation through Beta Diffusion

Xinyang Liu, Yilin He, Bo Chen, Mingyuan Zhou

TL;DR

Graph Beta Diffusion (GBD) is introduced, a generative model specifically designed to handle the diverse nature of graph data, and a modulation technique is proposed that enhances the realism of generated graphs by stabilizing critical graph topology while maintaining flexibility for other components.

Abstract

Diffusion models have excelled in generating natural images and are now being adapted to a variety of data types, including graphs. However, conventional models often rely on Gaussian or categorical diffusion processes, which can struggle to accommodate the mixed discrete and continuous components characteristic of graph data. Graphs typically feature discrete structures and continuous node attributes that often exhibit rich statistical patterns, including sparsity, bounded ranges, skewed distributions, and long-tailed behavior. To address these challenges, we introduce Graph Beta Diffusion (GBD), a generative model specifically designed to handle the diverse nature of graph data. GBD leverages a beta diffusion process, effectively modeling both continuous and discrete elements. Additionally, we propose a modulation technique that enhances the realism of generated graphs by stabilizing critical graph topology while maintaining flexibility for other components. GBD competes strongly with existing models across multiple general and biochemical graph benchmarks, showcasing its ability to capture the intricate balance between discrete and continuous features inherent in real-world graph data. The PyTorch code is available on GitHub.

Advancing Graph Generation through Beta Diffusion

TL;DR

Graph Beta Diffusion (GBD) is introduced, a generative model specifically designed to handle the diverse nature of graph data, and a modulation technique is proposed that enhances the realism of generated graphs by stabilizing critical graph topology while maintaining flexibility for other components.

Abstract

Diffusion models have excelled in generating natural images and are now being adapted to a variety of data types, including graphs. However, conventional models often rely on Gaussian or categorical diffusion processes, which can struggle to accommodate the mixed discrete and continuous components characteristic of graph data. Graphs typically feature discrete structures and continuous node attributes that often exhibit rich statistical patterns, including sparsity, bounded ranges, skewed distributions, and long-tailed behavior. To address these challenges, we introduce Graph Beta Diffusion (GBD), a generative model specifically designed to handle the diverse nature of graph data. GBD leverages a beta diffusion process, effectively modeling both continuous and discrete elements. Additionally, we propose a modulation technique that enhances the realism of generated graphs by stabilizing critical graph topology while maintaining flexibility for other components. GBD competes strongly with existing models across multiple general and biochemical graph benchmarks, showcasing its ability to capture the intricate balance between discrete and continuous features inherent in real-world graph data. The PyTorch code is available on GitHub.
Paper Structure (60 sections, 31 equations, 9 figures, 9 tables, 4 algorithms)

This paper contains 60 sections, 31 equations, 9 figures, 9 tables, 4 algorithms.

Figures (9)

  • Figure 1: Overview of the forward and reverse diffusion processes of GBD. The multiplicative factors ${\mathbf{Q}}_t$ and ${\mathbf{P}}_t$ are sampled from beta distributions parameterized by the initial graphs ${\mathbf{G}}_0$ and "clean graphs" predicted by $\hat{G}_\theta$. The neural network $\hat{G}_\theta$ is learned through minimizing \ref{['eq:GBD_loss']} constituted by $\mathcal{L}_{\mathrm{sampling}}$ and $\mathcal{L}_{\mathrm{correction}}$.
  • Figure 3: Examples of graphs generated by the GBD model on Planar, SBM, QM9, and ZINC250k datasets.
  • Figure 4: V.U.N. results for intermediate graph samples in the reverse chain on Planar (left) and SBM (right) datasets, with GBD demonstrating clear advantage in convergence rate.
  • Figure 5: Visualization of the generative process of GBD on the Community-small dataset.
  • Figure 6: Visualization of the generative process of GBD on the Ego-small dataset.
  • ...and 4 more figures

Theorems & Definitions (2)

  • Remark 1
  • Remark 2