Table of Contents
Fetching ...

Flatten Graphs as Sequences: Transformers are Scalable Graph Generators

Dexiong Chen, Markus Krimmel, Karsten Borgwardt

TL;DR

AutoGraph addresses the scalability gap in graph generation by transforming graphs into token sequences via SENTs, a neighborhood-aware variant of segmented Eulerian trails. A decoder-only transformer then models the joint distribution of SENT tokens, enabling efficient autoregressive graph generation with linear complexity in the number of edges. The approach yields state-of-the-art or competitive results on synthetic and molecular benchmarks, with substantial speedups over diffusion models and strong transfer and substructure-conditioned generation capabilities. By linking graph generation with language modeling, AutoGraph paves the way for graph foundation models and broader applications in graph-centric AI.

Abstract

We introduce AutoGraph, a scalable autoregressive model for attributed graph generation using decoder-only transformers. By flattening graphs into random sequences of tokens through a reversible process, AutoGraph enables modeling graphs as sequences without relying on additional node features that are expensive to compute, in contrast to diffusion-based approaches. This results in sampling complexity and sequence lengths that scale optimally linearly with the number of edges, making it scalable and efficient for large, sparse graphs. A key success factor of AutoGraph is that its sequence prefixes represent induced subgraphs, creating a direct link to sub-sentences in language modeling. Empirically, AutoGraph achieves state-of-the-art performance on synthetic and molecular benchmarks, with up to 100x faster generation and 3x faster training than leading diffusion models. It also supports substructure-conditioned generation without fine-tuning and shows promising transferability, bridging language modeling and graph generation to lay the groundwork for graph foundation models. Our code is available at https://github.com/BorgwardtLab/AutoGraph.

Flatten Graphs as Sequences: Transformers are Scalable Graph Generators

TL;DR

AutoGraph addresses the scalability gap in graph generation by transforming graphs into token sequences via SENTs, a neighborhood-aware variant of segmented Eulerian trails. A decoder-only transformer then models the joint distribution of SENT tokens, enabling efficient autoregressive graph generation with linear complexity in the number of edges. The approach yields state-of-the-art or competitive results on synthetic and molecular benchmarks, with substantial speedups over diffusion models and strong transfer and substructure-conditioned generation capabilities. By linking graph generation with language modeling, AutoGraph paves the way for graph foundation models and broader applications in graph-centric AI.

Abstract

We introduce AutoGraph, a scalable autoregressive model for attributed graph generation using decoder-only transformers. By flattening graphs into random sequences of tokens through a reversible process, AutoGraph enables modeling graphs as sequences without relying on additional node features that are expensive to compute, in contrast to diffusion-based approaches. This results in sampling complexity and sequence lengths that scale optimally linearly with the number of edges, making it scalable and efficient for large, sparse graphs. A key success factor of AutoGraph is that its sequence prefixes represent induced subgraphs, creating a direct link to sub-sentences in language modeling. Empirically, AutoGraph achieves state-of-the-art performance on synthetic and molecular benchmarks, with up to 100x faster generation and 3x faster training than leading diffusion models. It also supports substructure-conditioned generation without fine-tuning and shows promising transferability, bridging language modeling and graph generation to lay the groundwork for graph foundation models. Our code is available at https://github.com/BorgwardtLab/AutoGraph.

Paper Structure

This paper contains 49 sections, 5 equations, 21 figures, 14 tables, 1 algorithm.

Figures (21)

  • Figure 1: Overview of AutoGraph: (1) We use Algorithm \ref{['algo:sent_sampling']} to sample a SENT $s$ from the input graph: $s=(s_1,s_2)$ with $s_1=((v_1,\emptyset),(v_2,\emptyset),(v_3,\emptyset))$ and $s_2=((v_5, \{v_2\}), (v_4,\emptyset))$. (2) We tokenize it by reindexing the vertices based on their first occurrence order in $s$ and adding special tokens ('/' represents breakage between segments, '$\bm{<}$' and '$\bm{>}$' indicate the start and end of a neighborhood set). (3) We perform the next token prediction on the tokenized sequences using a decoder-only transformer or any language model.
  • Figure 2: Ablation experiments. Left: the effect of top-k sampling on the Planar and SBM datasets. Right: the validation loss and VUN scores when using SET and SENT on the Planar dataset.
  • Figure 3: Comparison of AutoGraph with and without pre-training on the Planar dataset with 50000 training steps. The model with pre-training converges clearly faster than the model without pre-training.
  • Figure 4: Substructure conditioned generation on one copy of the motif 1_4-Dihydroquinoline.
  • Figure 5: Substructure conditioned generation on two copies of the motif 1_4-Dihydroquinoline.
  • ...and 16 more figures

Theorems & Definitions (13)

  • Definition 2.1: Walk and trail
  • Definition 2.2: Generalized trail
  • Definition 2.3: Segmented Eulerian trail (SET)
  • Definition 2.4: SET isomorphism
  • Definition 2.5: Flattening
  • Definition 2.6: Prefix of a SET
  • Definition 2.7: Neighborhood sequence
  • Definition 2.8: Neighborhood trail
  • Definition 2.9: Segmented Eulerian neighborhood trail (SENT)
  • Definition 2.10: Causal SENT
  • ...and 3 more