Rethinking Tokenized Graph Transformers for Node Classification
Jinsong Chen, Chenyang Li, GaiChao Li, John E. Hopcroft, Kun He
TL;DR
SwapGT addresses the limited diversity of token sequences in tokenized Graph Transformers for node classification by introducing a token swapping mechanism that expands token neighborhoods beyond 1-hop. It employs a Transformer-based backbone to learn from multiple token sequences derived from both attribute and topology views and uses a center alignment loss to harmonize representations across views. Empirical results on eight datasets show SwapGT achieving state-of-the-art accuracy, with pronounced gains in sparse-label regimes thanks to token augmentation and regularization. The approach advances graph representation learning by enriching token-level context and stabilizing multi-sequence learning for robust node classification.
Abstract
Node tokenized graph Transformers (GTs) have shown promising performance in node classification. The generation of token sequences is the key module in existing tokenized GTs which transforms the input graph into token sequences, facilitating the node representation learning via Transformer. In this paper, we observe that the generations of token sequences in existing GTs only focus on the first-order neighbors on the constructed similarity graphs, which leads to the limited usage of nodes to generate diverse token sequences, further restricting the potential of tokenized GTs for node classification. To this end, we propose a new method termed SwapGT. SwapGT first introduces a novel token swapping operation based on the characteristics of token sequences that fully leverages the semantic relevance of nodes to generate more informative token sequences. Then, SwapGT leverages a Transformer-based backbone to learn node representations from the generated token sequences. Moreover, SwapGT develops a center alignment loss to constrain the representation learning from multiple token sequences, further enhancing the model performance. Extensive empirical results on various datasets showcase the superiority of SwapGT for node classification.
