Table of Contents
Fetching ...

Do Graph Diffusion Models Accurately Capture and Generate Substructure Distributions?

Xiyuan Wang, Yewei Liu, Lexi Pang, Siwei Chen, Muhan Zhang

TL;DR

The paper examines whether graph diffusion models faithfully capture substructure distributions in generated graphs. It shows that the score function of diffusion processes can be decomposed into a linear term and a nonlinear term expressed via graph polynomial bases whose coefficients depend on training subgraph counts, thus tying generation fidelity to substructure statistics. The authors demonstrate that standard backbones struggle to learn these polynomial terms, leading to substructure misspecification, while more expressive high order GNN backbones better approximate the score and improve substructure generation. This work provides a principled framework for evaluating and improving diffusion based graph generation with a focus on substructure fidelity and has practical implications for designing backbone architectures that preserve complex graph motifs.

Abstract

Diffusion models have gained popularity in graph generation tasks; however, the extent of their expressivity concerning the graph distributions they can learn is not fully understood. Unlike models in other domains, popular backbones for graph diffusion models, such as Graph Transformers, do not possess universal expressivity to accurately model the distribution scores of complex graph data. Our work addresses this limitation by focusing on the frequency of specific substructures as a key characteristic of target graph distributions. When evaluating existing models using this metric, we find that they fail to maintain the distribution of substructure counts observed in the training set when generating new graphs. To address this issue, we establish a theoretical connection between the expressivity of Graph Neural Networks (GNNs) and the overall performance of graph diffusion models, demonstrating that more expressive GNN backbones can better capture complex distribution patterns. By integrating advanced GNNs into the backbone architecture, we achieve significant improvements in substructure generation.

Do Graph Diffusion Models Accurately Capture and Generate Substructure Distributions?

TL;DR

The paper examines whether graph diffusion models faithfully capture substructure distributions in generated graphs. It shows that the score function of diffusion processes can be decomposed into a linear term and a nonlinear term expressed via graph polynomial bases whose coefficients depend on training subgraph counts, thus tying generation fidelity to substructure statistics. The authors demonstrate that standard backbones struggle to learn these polynomial terms, leading to substructure misspecification, while more expressive high order GNN backbones better approximate the score and improve substructure generation. This work provides a principled framework for evaluating and improving diffusion based graph generation with a focus on substructure fidelity and has practical implications for designing backbone architectures that preserve complex graph motifs.

Abstract

Diffusion models have gained popularity in graph generation tasks; however, the extent of their expressivity concerning the graph distributions they can learn is not fully understood. Unlike models in other domains, popular backbones for graph diffusion models, such as Graph Transformers, do not possess universal expressivity to accurately model the distribution scores of complex graph data. Our work addresses this limitation by focusing on the frequency of specific substructures as a key characteristic of target graph distributions. When evaluating existing models using this metric, we find that they fail to maintain the distribution of substructure counts observed in the training set when generating new graphs. To address this issue, we establish a theoretical connection between the expressivity of Graph Neural Networks (GNNs) and the overall performance of graph diffusion models, demonstrating that more expressive GNN backbones can better capture complex distribution patterns. By integrating advanced GNNs into the backbone architecture, we achieve significant improvements in substructure generation.

Paper Structure

This paper contains 16 sections, 2 theorems, 24 equations, 2 tables.

Key Result

Theorem 4.1

With the diffusion process in Equation equ:diffcondi, assuming input graph distribution is permutation invariant and contains only graph with $n$ nodes and $m$ edges, the score function where $F_t(A_t): \mathbb{R}^{n\times n}\to \mathbb{R}^{n\times n}, G_t(A_t): \mathbb{R}^{n\times n}\to \mathbb{R}$ are functions as follows, where

Theorems & Definitions (2)

  • Theorem 4.1
  • Corollary 4.2