Random Walk Diffusion for Efficient Large-Scale Graph Generation
Tobias Bernecker, Ghalia Rehawi, Francesco Paolo Casale, Janine Knauer-Arloth, Annalisa Marsico
TL;DR
ARROW-Diff tackles the challenge of generating large-scale graphs with realistic topology by introducing a discrete diffusion process on random walks (OA-ARDM) coupled with a GNN-based edge validator. The method iteratively generates edge proposals from random walks and refines them through degree-guided sampling, enabling scalable generation up to tens of thousands of nodes. Empirical results on five citation graphs and a synthetic SBM show improved topology metrics (e.g., triangles, degree distribution) and substantially faster generation times compared to baselines. The runtime analysis indicates ARROW-Diff achieves favorable complexity $O\left(L\,(N\,D + |E|)\right)$, highlighting its practicality for large-scale graph synthesis.
Abstract
Graph generation addresses the problem of generating new graphs that have a data distribution similar to real-world graphs. While previous diffusion-based graph generation methods have shown promising results, they often struggle to scale to large graphs. In this work, we propose ARROW-Diff (AutoRegressive RandOm Walk Diffusion), a novel random walk-based diffusion approach for efficient large-scale graph generation. Our method encompasses two components in an iterative process of random walk sampling and graph pruning. We demonstrate that ARROW-Diff can scale to large graphs efficiently, surpassing other baseline methods in terms of both generation time and multiple graph statistics, reflecting the high quality of the generated graphs.
