Table of Contents
Fetching ...

Graph Tokenization for Bridging Graphs and Transformers

Zeyuan Guo, Enmao Diao, Cheng Yang, Chuan Shi

Abstract

The success of large pretrained Transformers is closely tied to tokenizers, which convert raw input into discrete symbols. Extending these models to graph-structured data remains a significant challenge. In this work, we introduce a graph tokenization framework that generates sequential representations of graphs by combining reversible graph serialization, which preserves graph information, with Byte Pair Encoding (BPE), a widely adopted tokenizer in large language models (LLMs). To better capture structural information, the graph serialization process is guided by global statistics of graph substructures, ensuring that frequently occurring substructures appear more often in the sequence and can be merged by BPE into meaningful tokens. Empirical results demonstrate that the proposed tokenizer enables Transformers such as BERT to be directly applied to graph benchmarks without architectural modifications. The proposed approach achieves state-of-the-art results on 14 benchmark datasets and frequently outperforms both graph neural networks and specialized graph transformers. This work bridges the gap between graph-structured data and the ecosystem of sequence models. Our code is available at \href{https://github.com/BUPT-GAMMA/Graph-Tokenization-for-Bridging-Graphs-and-Transformers}{\color{blue}here}.

Graph Tokenization for Bridging Graphs and Transformers

Abstract

The success of large pretrained Transformers is closely tied to tokenizers, which convert raw input into discrete symbols. Extending these models to graph-structured data remains a significant challenge. In this work, we introduce a graph tokenization framework that generates sequential representations of graphs by combining reversible graph serialization, which preserves graph information, with Byte Pair Encoding (BPE), a widely adopted tokenizer in large language models (LLMs). To better capture structural information, the graph serialization process is guided by global statistics of graph substructures, ensuring that frequently occurring substructures appear more often in the sequence and can be merged by BPE into meaningful tokens. Empirical results demonstrate that the proposed tokenizer enables Transformers such as BERT to be directly applied to graph benchmarks without architectural modifications. The proposed approach achieves state-of-the-art results on 14 benchmark datasets and frequently outperforms both graph neural networks and specialized graph transformers. This work bridges the gap between graph-structured data and the ecosystem of sequence models. Our code is available at \href{https://github.com/BUPT-GAMMA/Graph-Tokenization-for-Bridging-Graphs-and-Transformers}{\color{blue}here}.
Paper Structure (70 sections, 12 equations, 10 figures, 17 tables, 1 algorithm)

This paper contains 70 sections, 12 equations, 10 figures, 17 tables, 1 algorithm.

Figures (10)

  • Figure 1: Framework of the proposed graph tokenizer. (A) Substructure frequencies are collected from the training graphs. (B) Structure-guided and reversible serialization is performed using a frequency-guided Eulerian circuit, where the next edge is selected according to a priority rule (e.g., red C: 7$\to$13$\to$15$\to$17). (C) A BPE vocabulary is trained on the serialized corpus, and graphs are encoded into discrete tokens for use in downstream sequence models.
  • Figure 2: Efficiency analysis on the ZINC dataset. (a) BPE greatly reduces token sequence length from serialization. (b) Graph tokenization leads to substantial training speedup by enabling efficient processing with standard Transformers.
  • Figure 3: Illustration of the BPE merging process on ZINC. Each row shows how simple substructures (left) are iteratively merged to form larger, chemically meaningful tokens (middle and right).
  • Figure 4: Visualization of autoregressive graph generation on MNIST. The model generates the graph structure token-by-token (left to right). The resulting sequences are decoded back into grid graphs, forming coherent digit images. This demonstrates the framework's capability to support generative tasks using standard decoder-only Transformers.
  • Figure 5: Efficiency analysis on the QM9 dataset.
  • ...and 5 more figures