Table of Contents
Fetching ...

A Simple and Scalable Representation for Graph Generation

Yunhui Jang, Seul Lee, Sungsoo Ahn

TL;DR

This work addresses the scalability bottleneck of graph-generation methods that rely on quadratic adjacency-matrix representations by introducing GEEL, a gap-encoded edge-list representation with a vocabulary bounded by $B^2$ and a representation size of $M$. By pairing GEEL with node-position embeddings and an autoregressive LSTM generator, the approach achieves $O(M)$ generation complexity and improves scalability, further extended to attributed graphs through a grammar. Empirically, GEEL yields state-of-the-art or competitive results across ten general graph benchmarks and two molecular datasets, with faster inference and reduced memory demands due to the compact representation. The method thus offers a practical, scalable path for generating large graphs and molecules, with released code for reproducibility.

Abstract

Recently, there has been a surge of interest in employing neural networks for graph generation, a fundamental statistical learning problem with critical applications like molecule design and community analysis. However, most approaches encounter significant limitations when generating large-scale graphs. This is due to their requirement to output the full adjacency matrices whose size grows quadratically with the number of nodes. In response to this challenge, we introduce a new, simple, and scalable graph representation named gap encoded edge list (GEEL) that has a small representation size that aligns with the number of edges. In addition, GEEL significantly reduces the vocabulary size by incorporating the gap encoding and bandwidth restriction schemes. GEEL can be autoregressively generated with the incorporation of node positional encoding, and we further extend GEEL to deal with attributed graphs by designing a new grammar. Our findings reveal that the adoption of this compact representation not only enhances scalability but also bolsters performance by simplifying the graph generation process. We conduct a comprehensive evaluation across ten non-attributed and two molecular graph generation tasks, demonstrating the effectiveness of GEEL.

A Simple and Scalable Representation for Graph Generation

TL;DR

This work addresses the scalability bottleneck of graph-generation methods that rely on quadratic adjacency-matrix representations by introducing GEEL, a gap-encoded edge-list representation with a vocabulary bounded by and a representation size of . By pairing GEEL with node-position embeddings and an autoregressive LSTM generator, the approach achieves generation complexity and improves scalability, further extended to attributed graphs through a grammar. Empirically, GEEL yields state-of-the-art or competitive results across ten general graph benchmarks and two molecular datasets, with faster inference and reduced memory demands due to the compact representation. The method thus offers a practical, scalable path for generating large graphs and molecules, with released code for reproducibility.

Abstract

Recently, there has been a surge of interest in employing neural networks for graph generation, a fundamental statistical learning problem with critical applications like molecule design and community analysis. However, most approaches encounter significant limitations when generating large-scale graphs. This is due to their requirement to output the full adjacency matrices whose size grows quadratically with the number of nodes. In response to this challenge, we introduce a new, simple, and scalable graph representation named gap encoded edge list (GEEL) that has a small representation size that aligns with the number of edges. In addition, GEEL significantly reduces the vocabulary size by incorporating the gap encoding and bandwidth restriction schemes. GEEL can be autoregressively generated with the incorporation of node positional encoding, and we further extend GEEL to deal with attributed graphs by designing a new grammar. Our findings reveal that the adoption of this compact representation not only enhances scalability but also bolsters performance by simplifying the graph generation process. We conduct a comprehensive evaluation across ten non-attributed and two molecular graph generation tasks, demonstrating the effectiveness of GEEL.
Paper Structure (35 sections, 9 equations, 16 figures, 17 tables)

This paper contains 35 sections, 9 equations, 16 figures, 17 tables.

Figures (16)

  • Figure 1: Overview and advantages of gap encoded edge list (GEEL).
  • Figure 2: Bandwidth of an adjacency matrix.
  • Figure 3: An example of attributed GEEL. The colored parts of the attributed GEEL denote the node features (i.e., C and N) and edge features (i.e., single bond -). The shaded parts denote the self-loops added to the original GEEL, where self-loops are added to the nodes that are not connected to the nodes with larger node indices (i.e., nodes with indices 3 and 4).
  • Figure 4: Infer. time on various graph sizes.
  • Figure 5: Average MMD results for different model architectures.
  • ...and 11 more figures