GraphRNN: Generating Realistic Graphs with Deep Auto-regressive Models
Jiaxuan You, Rex Ying, Xiang Ren, William L. Hamilton, Jure Leskovec
TL;DR
GraphRNN introduces a scalable autoregressive framework for generating realistic graphs by decomposing graph construction into a graph-level node sequence and an edge-level sequence conditioned on the evolving graph. By representing graphs as BFS-ordered sequences and employing shared-weight RNNs, GraphRNN achieves high fidelity across diverse graph families while handling variable sizes and complex edge dependencies. The authors validate the approach with a rigorous MMD-based evaluation suite, showing substantial improvements over traditional and deep baselines and demonstrating robustness to structural variations. This work advances practical, data-driven graph synthesis with scalable training and quantitative, higher-order-graph statistics-based evaluation, enabling applications across biology, chemistry, and social sciences.
Abstract
Modeling and generating graphs is fundamental for studying networks in biology, engineering, and social sciences. However, modeling complex distributions over graphs and then efficiently sampling from these distributions is challenging due to the non-unique, high-dimensional nature of graphs and the complex, non-local dependencies that exist between edges in a given graph. Here we propose GraphRNN, a deep autoregressive model that addresses the above challenges and approximates any distribution of graphs with minimal assumptions about their structure. GraphRNN learns to generate graphs by training on a representative set of graphs and decomposes the graph generation process into a sequence of node and edge formations, conditioned on the graph structure generated so far. In order to quantitatively evaluate the performance of GraphRNN, we introduce a benchmark suite of datasets, baselines and novel evaluation metrics based on Maximum Mean Discrepancy, which measure distances between sets of graphs. Our experiments show that GraphRNN significantly outperforms all baselines, learning to generate diverse graphs that match the structural characteristics of a target set, while also scaling to graphs 50 times larger than previous deep models.
