A Simple and Scalable Representation for Graph Generation
Yunhui Jang, Seul Lee, Sungsoo Ahn
TL;DR
This work addresses the scalability bottleneck of graph-generation methods that rely on quadratic adjacency-matrix representations by introducing GEEL, a gap-encoded edge-list representation with a vocabulary bounded by $B^2$ and a representation size of $M$. By pairing GEEL with node-position embeddings and an autoregressive LSTM generator, the approach achieves $O(M)$ generation complexity and improves scalability, further extended to attributed graphs through a grammar. Empirically, GEEL yields state-of-the-art or competitive results across ten general graph benchmarks and two molecular datasets, with faster inference and reduced memory demands due to the compact representation. The method thus offers a practical, scalable path for generating large graphs and molecules, with released code for reproducibility.
Abstract
Recently, there has been a surge of interest in employing neural networks for graph generation, a fundamental statistical learning problem with critical applications like molecule design and community analysis. However, most approaches encounter significant limitations when generating large-scale graphs. This is due to their requirement to output the full adjacency matrices whose size grows quadratically with the number of nodes. In response to this challenge, we introduce a new, simple, and scalable graph representation named gap encoded edge list (GEEL) that has a small representation size that aligns with the number of edges. In addition, GEEL significantly reduces the vocabulary size by incorporating the gap encoding and bandwidth restriction schemes. GEEL can be autoregressively generated with the incorporation of node positional encoding, and we further extend GEEL to deal with attributed graphs by designing a new grammar. Our findings reveal that the adoption of this compact representation not only enhances scalability but also bolsters performance by simplifying the graph generation process. We conduct a comprehensive evaluation across ten non-attributed and two molecular graph generation tasks, demonstrating the effectiveness of GEEL.
