Do Graph Diffusion Models Accurately Capture and Generate Substructure Distributions?
Xiyuan Wang, Yewei Liu, Lexi Pang, Siwei Chen, Muhan Zhang
TL;DR
The paper examines whether graph diffusion models faithfully capture substructure distributions in generated graphs. It shows that the score function of diffusion processes can be decomposed into a linear term and a nonlinear term expressed via graph polynomial bases whose coefficients depend on training subgraph counts, thus tying generation fidelity to substructure statistics. The authors demonstrate that standard backbones struggle to learn these polynomial terms, leading to substructure misspecification, while more expressive high order GNN backbones better approximate the score and improve substructure generation. This work provides a principled framework for evaluating and improving diffusion based graph generation with a focus on substructure fidelity and has practical implications for designing backbone architectures that preserve complex graph motifs.
Abstract
Diffusion models have gained popularity in graph generation tasks; however, the extent of their expressivity concerning the graph distributions they can learn is not fully understood. Unlike models in other domains, popular backbones for graph diffusion models, such as Graph Transformers, do not possess universal expressivity to accurately model the distribution scores of complex graph data. Our work addresses this limitation by focusing on the frequency of specific substructures as a key characteristic of target graph distributions. When evaluating existing models using this metric, we find that they fail to maintain the distribution of substructure counts observed in the training set when generating new graphs. To address this issue, we establish a theoretical connection between the expressivity of Graph Neural Networks (GNNs) and the overall performance of graph diffusion models, demonstrating that more expressive GNN backbones can better capture complex distribution patterns. By integrating advanced GNNs into the backbone architecture, we achieve significant improvements in substructure generation.
