Table of Contents
Fetching ...

Tokenphormer: Structure-aware Multi-token Graph Transformer for Node Classification

Zijie Zhou, Zhaoqi Lu, Xuekai Wei, Rongqin Chen, Shenghui Zhang, Pak Lon Ip, Leong Hou U

TL;DR

Tokenphormer tackles the limitations of traditional GNNs and graph Transformers by introducing a structure-aware, multi-token representation for nodes. It constructs diverse tokens—walk-token (four walk types), SGPM-token, and hop-token—through graph serialization and a pre-training phase (SGPM) to cover local and global structure, then jointly learns them with a Transformer and attention-based readout. The authors provide theoretical analysis showing graph documents can distinguish non-isomorphic graphs and that token coverage improves with more tokens, while experiments on six homogeneous and heterogeneous benchmarks demonstrate state-of-the-art performance on node classification. The approach offers a scalable, flexible framework for structure-aware graph learning with robust generalization across graph types.

Abstract

Graph Neural Networks (GNNs) are widely used in graph data mining tasks. Traditional GNNs follow a message passing scheme that can effectively utilize local and structural information. However, the phenomena of over-smoothing and over-squashing limit the receptive field in message passing processes. Graph Transformers were introduced to address these issues, achieving a global receptive field but suffering from the noise of irrelevant nodes and loss of structural information. Therefore, drawing inspiration from fine-grained token-based representation learning in Natural Language Processing (NLP), we propose the Structure-aware Multi-token Graph Transformer (Tokenphormer), which generates multiple tokens to effectively capture local and structural information and explore global information at different levels of granularity. Specifically, we first introduce the walk-token generated by mixed walks consisting of four walk types to explore the graph and capture structure and contextual information flexibly. To ensure local and global information coverage, we also introduce the SGPM-token (obtained through the Self-supervised Graph Pre-train Model, SGPM) and the hop-token, extending the length and density limit of the walk-token, respectively. Finally, these expressive tokens are fed into the Transformer model to learn node representations collaboratively. Experimental results demonstrate that the capability of the proposed Tokenphormer can achieve state-of-the-art performance on node classification tasks.

Tokenphormer: Structure-aware Multi-token Graph Transformer for Node Classification

TL;DR

Tokenphormer tackles the limitations of traditional GNNs and graph Transformers by introducing a structure-aware, multi-token representation for nodes. It constructs diverse tokens—walk-token (four walk types), SGPM-token, and hop-token—through graph serialization and a pre-training phase (SGPM) to cover local and global structure, then jointly learns them with a Transformer and attention-based readout. The authors provide theoretical analysis showing graph documents can distinguish non-isomorphic graphs and that token coverage improves with more tokens, while experiments on six homogeneous and heterogeneous benchmarks demonstrate state-of-the-art performance on node classification. The approach offers a scalable, flexible framework for structure-aware graph learning with robust generalization across graph types.

Abstract

Graph Neural Networks (GNNs) are widely used in graph data mining tasks. Traditional GNNs follow a message passing scheme that can effectively utilize local and structural information. However, the phenomena of over-smoothing and over-squashing limit the receptive field in message passing processes. Graph Transformers were introduced to address these issues, achieving a global receptive field but suffering from the noise of irrelevant nodes and loss of structural information. Therefore, drawing inspiration from fine-grained token-based representation learning in Natural Language Processing (NLP), we propose the Structure-aware Multi-token Graph Transformer (Tokenphormer), which generates multiple tokens to effectively capture local and structural information and explore global information at different levels of granularity. Specifically, we first introduce the walk-token generated by mixed walks consisting of four walk types to explore the graph and capture structure and contextual information flexibly. To ensure local and global information coverage, we also introduce the SGPM-token (obtained through the Self-supervised Graph Pre-train Model, SGPM) and the hop-token, extending the length and density limit of the walk-token, respectively. Finally, these expressive tokens are fed into the Transformer model to learn node representations collaboratively. Experimental results demonstrate that the capability of the proposed Tokenphormer can achieve state-of-the-art performance on node classification tasks.

Paper Structure

This paper contains 50 sections, 3 theorems, 22 equations, 6 figures, 4 tables.

Key Result

Lemma 1

If $G$ is a connected, non-bipartite graph, then for any initial distribution $\pi_0$ on $v \in V$, we have:

Figures (6)

  • Figure 1: Idea of Tokenphormer.
  • Figure 2: Framework. RW refers to random walk while NJW stands for neighborhood jump walk. Tokenphormer generates diverse tokens with different levels of granularity for the target node (red node), respectively walk-tokens (yellow area), SGPM-token (blue area) and hop-token (green area), comprehensively mining essential information from the whole graph. Then, all these tokens are constructed into a sequence and fed into the Transformer-based backbone to jointly learn the final node representation. Finally, an MLP-based module is employed for node classification tasks.
  • Figure 3: Transition Probability of Neighborhood Jump Walk.
  • Figure 4: Expressiveness Comparison. The grey dashed line denotes graph diameter. For Flickr and DBLP, the orange dashed line means the NAN result.
  • Figure 5: Coverage Analysis. Let $\sigma = \frac{\exp\left(-2\epsilon^2n\right)}{n}$, the figure shows the change of $\sigma$ with increase of $n$ in case of different $\epsilon$ in equation \ref{['equCoverage']}.
  • ...and 1 more figures

Theorems & Definitions (7)

  • Lemma 1
  • Definition 1: Graph sentence and document
  • Definition 2: Non-backtracking random walk
  • Definition 3: Neighborhood jump walk
  • Lemma 2
  • Lemma 3
  • proof