Table of Contents
Fetching ...

Enhanced Graph Transformer with Serialized Graph Tokens

Ruixiang Wang, Yuyang Hong, Shiming Xiang, Chunhong Pan

TL;DR

The paper addresses the bottleneck in graph-level representation by replacing a single graph token with an ordered sequence of serialized graph tokens learned via a set of basis tokens and a graph-serialization mechanism. It combines local node embedding and message passing with a serialization module, then applies multi-head self-attention over the token sequence to capture global interactions, followed by FFN-based prediction on the token sequence. The authors report state-of-the-art results on ZINC, ZINC-FULL, and MolHIV benchmarks and validate the contributions via ablations showing the serialization and multi-token attention are essential and synergistic. This serialized-token paradigm offers a more expressive graph-level representation while maintaining competitive efficiency, with potential to extend to larger graphs and diverse graph tasks.

Abstract

Transformers have demonstrated success in graph learning, particularly for node-level tasks. However, existing methods encounter an information bottleneck when generating graph-level representations. The prevalent single token paradigm fails to fully leverage the inherent strength of self-attention in encoding token sequences, and degenerates into a weighted sum of node signals. To address this issue, we design a novel serialized token paradigm to encapsulate global signals more effectively. Specifically, a graph serialization method is proposed to aggregate node signals into serialized graph tokens, with positional encoding being automatically involved. Then, stacked self-attention layers are applied to encode this token sequence and capture its internal dependencies. Our method can yield more expressive graph representations by modeling complex interactions among multiple graph tokens. Experimental results show that our method achieves state-of-the-art results on several graph-level benchmarks. Ablation studies verify the effectiveness of the proposed modules.

Enhanced Graph Transformer with Serialized Graph Tokens

TL;DR

The paper addresses the bottleneck in graph-level representation by replacing a single graph token with an ordered sequence of serialized graph tokens learned via a set of basis tokens and a graph-serialization mechanism. It combines local node embedding and message passing with a serialization module, then applies multi-head self-attention over the token sequence to capture global interactions, followed by FFN-based prediction on the token sequence. The authors report state-of-the-art results on ZINC, ZINC-FULL, and MolHIV benchmarks and validate the contributions via ablations showing the serialization and multi-token attention are essential and synergistic. This serialized-token paradigm offers a more expressive graph-level representation while maintaining competitive efficiency, with potential to extend to larger graphs and diverse graph tasks.

Abstract

Transformers have demonstrated success in graph learning, particularly for node-level tasks. However, existing methods encounter an information bottleneck when generating graph-level representations. The prevalent single token paradigm fails to fully leverage the inherent strength of self-attention in encoding token sequences, and degenerates into a weighted sum of node signals. To address this issue, we design a novel serialized token paradigm to encapsulate global signals more effectively. Specifically, a graph serialization method is proposed to aggregate node signals into serialized graph tokens, with positional encoding being automatically involved. Then, stacked self-attention layers are applied to encode this token sequence and capture its internal dependencies. Our method can yield more expressive graph representations by modeling complex interactions among multiple graph tokens. Experimental results show that our method achieves state-of-the-art results on several graph-level benchmarks. Ablation studies verify the effectiveness of the proposed modules.
Paper Structure (12 sections, 7 equations, 2 figures, 3 tables)

This paper contains 12 sections, 7 equations, 2 figures, 3 tables.

Figures (2)

  • Figure 1: Paradigms. The single token paradigm (left) collapses the graph into a single token, which risks over-compression of node signals. Our serialized token paradigm (right) models the graph as a token sequence to retain more global signals.
  • Figure 2: Model structure. Our serialized token paradigm comprises four modules. "ES" denotes Euclidean similarity, "GS" denotes Gumbel Softmax, and "CON" denotes concatenation. The single token paradigm is also illustrated for comparison.