Table of Contents
Fetching ...

GraphRNN: Generating Realistic Graphs with Deep Auto-regressive Models

Jiaxuan You, Rex Ying, Xiang Ren, William L. Hamilton, Jure Leskovec

TL;DR

GraphRNN introduces a scalable autoregressive framework for generating realistic graphs by decomposing graph construction into a graph-level node sequence and an edge-level sequence conditioned on the evolving graph. By representing graphs as BFS-ordered sequences and employing shared-weight RNNs, GraphRNN achieves high fidelity across diverse graph families while handling variable sizes and complex edge dependencies. The authors validate the approach with a rigorous MMD-based evaluation suite, showing substantial improvements over traditional and deep baselines and demonstrating robustness to structural variations. This work advances practical, data-driven graph synthesis with scalable training and quantitative, higher-order-graph statistics-based evaluation, enabling applications across biology, chemistry, and social sciences.

Abstract

Modeling and generating graphs is fundamental for studying networks in biology, engineering, and social sciences. However, modeling complex distributions over graphs and then efficiently sampling from these distributions is challenging due to the non-unique, high-dimensional nature of graphs and the complex, non-local dependencies that exist between edges in a given graph. Here we propose GraphRNN, a deep autoregressive model that addresses the above challenges and approximates any distribution of graphs with minimal assumptions about their structure. GraphRNN learns to generate graphs by training on a representative set of graphs and decomposes the graph generation process into a sequence of node and edge formations, conditioned on the graph structure generated so far. In order to quantitatively evaluate the performance of GraphRNN, we introduce a benchmark suite of datasets, baselines and novel evaluation metrics based on Maximum Mean Discrepancy, which measure distances between sets of graphs. Our experiments show that GraphRNN significantly outperforms all baselines, learning to generate diverse graphs that match the structural characteristics of a target set, while also scaling to graphs 50 times larger than previous deep models.

GraphRNN: Generating Realistic Graphs with Deep Auto-regressive Models

TL;DR

GraphRNN introduces a scalable autoregressive framework for generating realistic graphs by decomposing graph construction into a graph-level node sequence and an edge-level sequence conditioned on the evolving graph. By representing graphs as BFS-ordered sequences and employing shared-weight RNNs, GraphRNN achieves high fidelity across diverse graph families while handling variable sizes and complex edge dependencies. The authors validate the approach with a rigorous MMD-based evaluation suite, showing substantial improvements over traditional and deep baselines and demonstrating robustness to structural variations. This work advances practical, data-driven graph synthesis with scalable training and quantitative, higher-order-graph statistics-based evaluation, enabling applications across biology, chemistry, and social sciences.

Abstract

Modeling and generating graphs is fundamental for studying networks in biology, engineering, and social sciences. However, modeling complex distributions over graphs and then efficiently sampling from these distributions is challenging due to the non-unique, high-dimensional nature of graphs and the complex, non-local dependencies that exist between edges in a given graph. Here we propose GraphRNN, a deep autoregressive model that addresses the above challenges and approximates any distribution of graphs with minimal assumptions about their structure. GraphRNN learns to generate graphs by training on a representative set of graphs and decomposes the graph generation process into a sequence of node and edge formations, conditioned on the graph structure generated so far. In order to quantitatively evaluate the performance of GraphRNN, we introduce a benchmark suite of datasets, baselines and novel evaluation metrics based on Maximum Mean Discrepancy, which measure distances between sets of graphs. Our experiments show that GraphRNN significantly outperforms all baselines, learning to generate diverse graphs that match the structural characteristics of a target set, while also scaling to graphs 50 times larger than previous deep models.

Paper Structure

This paper contains 28 sections, 3 theorems, 10 equations, 7 figures, 2 tables, 1 algorithm.

Key Result

Proposition 1

Suppose $v_1, \ldots, v_n$ is a BFS ordering of $n$ nodes in graph $G$, and $(v_i, v_{j-1}) \in E$ but $(v_i, v_j) \not \in E$ for some $i < j \le n$, then $(v_{i'}, v_{j'}) \not \in E$, $\forall 1 \le i' \le i$ and $j \le j' < n$.

Figures (7)

  • Figure 1: GraphRNN at inference time. Green arrows denote the graph-level RNN that encodes the "graph state" vector $h_i$ in its hidden state, updated by the predicted adjacency vector $S^\pi_{i}$ for node $\pi(v_i)$. Blue arrows represent the edge-level RNN, whose hidden state is initialized by the graph-level RNN, that is used to predict the adjacency vector $S^\pi_{i}$ for node $\pi(v_i)$.
  • Figure 2: Visualization of graphs from grid dataset (Left group), community dataset (Middle group) and Ego dataset (Right group). Within each group, graphs from training set (First row), graphs generated by GraphRNN (Second row) and graphs generated by Kronecker, MMSB and B-A baselines respectively (Third row) are shown. Different visualization layouts are used for different datasets.
  • Figure 3: Average degree (Left) and clustering coefficient (Right) distributions of graphs from test set and graphs generated by GraphRNN and baseline models.
  • Figure 4: MMD performance of different approaches on degree (Left) and clustering coefficient (Right) under different noise level.
  • Figure 5: Illustrative example of reducing the maximum dimension $M$ of $S^\pi_i$ through the BFS node ordering. Here we show the adjacency matrix of a graph with $N=10$ nodes. Without the BFS node ordering (Left), we have to set $M=N-1$ to encode all the necessary connection information (shown in dark square). With the BFS node ordering, we could set $M$ to be a constant smaller than $N$ (we show $M=3$ in the figure).
  • ...and 2 more figures

Theorems & Definitions (3)

  • Proposition 1
  • Corollary 1
  • Proposition 2