NAGphormer: A Tokenized Graph Transformer for Node Classification in Large Graphs

Jinsong Chen; Kaiyuan Gao; Gaichao Li; Kun He

NAGphormer: A Tokenized Graph Transformer for Node Classification in Large Graphs

Jinsong Chen, Kaiyuan Gao, Gaichao Li, Kun He

TL;DR

NAGphormer addresses the scalability bottleneck of graph Transformers by tokenizing per-node multi-hop neighborhoods with Hop2Token, enabling mini-batch training on large graphs. The model processes each node as a sequence of hop-specific tokens through a Transformer and uses an adaptive, hop-aware readout to combine neighborhood information. Theoretical analysis links NAGphormer’s expressiveness to overcoming fixed-weights limitations of decoupled GCNs, and experiments show consistent improvements over both graph Transformers and mainstream GNNs across small and large datasets. This tokenized, scalable approach makes graph Transformers practical for large-scale node classification tasks.

Abstract

The graph Transformer emerges as a new architecture and has shown superior performance on various graph mining tasks. In this work, we observe that existing graph Transformers treat nodes as independent tokens and construct a single long sequence composed of all node tokens so as to train the Transformer model, causing it hard to scale to large graphs due to the quadratic complexity on the number of nodes for the self-attention computation. To this end, we propose a Neighborhood Aggregation Graph Transformer (NAGphormer) that treats each node as a sequence containing a series of tokens constructed by our proposed Hop2Token module. For each node, Hop2Token aggregates the neighborhood features from different hops into different representations and thereby produces a sequence of token vectors as one input. In this way, NAGphormer could be trained in a mini-batch manner and thus could scale to large graphs. Moreover, we mathematically show that as compared to a category of advanced Graph Neural Networks (GNNs), the decoupled Graph Convolutional Network, NAGphormer could learn more informative node representations from the multi-hop neighborhoods. Extensive experiments on benchmark datasets from small to large are conducted to demonstrate that NAGphormer consistently outperforms existing graph Transformers and mainstream GNNs. Code is available at https://github.com/JHL-HUST/NAGphormer.

NAGphormer: A Tokenized Graph Transformer for Node Classification in Large Graphs

TL;DR

Abstract

Paper Structure (28 sections, 16 equations, 4 figures, 6 tables, 1 algorithm)

This paper contains 28 sections, 16 equations, 4 figures, 6 tables, 1 algorithm.

Introduction
Background
Problem Formulation
Graph Neural Network
Transformer
The Proposed NAGphormer
Hop2Token
NAGphormer for Node Classification
Implementation details
Theoretical Analysis of NAGphormer
Experiments
Experimental Setup
Comparison on Small-scale Datasets
Comparison on Large-scale Datasets
Ablation Study
...and 13 more sections

Figures (4)

Figure 1: Model framework of NAGphormer. NAGphormer first uses a novel neighborhood aggregation module, Hop2Token, to construct a sequence for each node based on the tokens of different hops of neighbors. Then, NAGphormer learns the node representations using a Transformer backbone, and an attention-based readout function is developed to aggregate neighborhood information of different hops adaptively. An MLP-based module is used in the end for label prediction.
Figure 2: The performance of NAGphormer via different readout functions.
Figure 3: Performance of NAGphormer on different parameters.
Figure 4: The performance of different readout functions on large-scale datasets.

NAGphormer: A Tokenized Graph Transformer for Node Classification in Large Graphs

TL;DR

Abstract

NAGphormer: A Tokenized Graph Transformer for Node Classification in Large Graphs

Authors

TL;DR

Abstract

Table of Contents

Figures (4)