NAGphormer: A Tokenized Graph Transformer for Node Classification in Large Graphs
Jinsong Chen, Kaiyuan Gao, Gaichao Li, Kun He
TL;DR
NAGphormer addresses the scalability bottleneck of graph Transformers by tokenizing per-node multi-hop neighborhoods with Hop2Token, enabling mini-batch training on large graphs. The model processes each node as a sequence of hop-specific tokens through a Transformer and uses an adaptive, hop-aware readout to combine neighborhood information. Theoretical analysis links NAGphormer’s expressiveness to overcoming fixed-weights limitations of decoupled GCNs, and experiments show consistent improvements over both graph Transformers and mainstream GNNs across small and large datasets. This tokenized, scalable approach makes graph Transformers practical for large-scale node classification tasks.
Abstract
The graph Transformer emerges as a new architecture and has shown superior performance on various graph mining tasks. In this work, we observe that existing graph Transformers treat nodes as independent tokens and construct a single long sequence composed of all node tokens so as to train the Transformer model, causing it hard to scale to large graphs due to the quadratic complexity on the number of nodes for the self-attention computation. To this end, we propose a Neighborhood Aggregation Graph Transformer (NAGphormer) that treats each node as a sequence containing a series of tokens constructed by our proposed Hop2Token module. For each node, Hop2Token aggregates the neighborhood features from different hops into different representations and thereby produces a sequence of token vectors as one input. In this way, NAGphormer could be trained in a mini-batch manner and thus could scale to large graphs. Moreover, we mathematically show that as compared to a category of advanced Graph Neural Networks (GNNs), the decoupled Graph Convolutional Network, NAGphormer could learn more informative node representations from the multi-hop neighborhoods. Extensive experiments on benchmark datasets from small to large are conducted to demonstrate that NAGphormer consistently outperforms existing graph Transformers and mainstream GNNs. Code is available at https://github.com/JHL-HUST/NAGphormer.
