Table of Contents
Fetching ...

Rethinking Tokenized Graph Transformers for Node Classification

Jinsong Chen, Chenyang Li, GaiChao Li, John E. Hopcroft, Kun He

TL;DR

SwapGT addresses the limited diversity of token sequences in tokenized Graph Transformers for node classification by introducing a token swapping mechanism that expands token neighborhoods beyond 1-hop. It employs a Transformer-based backbone to learn from multiple token sequences derived from both attribute and topology views and uses a center alignment loss to harmonize representations across views. Empirical results on eight datasets show SwapGT achieving state-of-the-art accuracy, with pronounced gains in sparse-label regimes thanks to token augmentation and regularization. The approach advances graph representation learning by enriching token-level context and stabilizing multi-sequence learning for robust node classification.

Abstract

Node tokenized graph Transformers (GTs) have shown promising performance in node classification. The generation of token sequences is the key module in existing tokenized GTs which transforms the input graph into token sequences, facilitating the node representation learning via Transformer. In this paper, we observe that the generations of token sequences in existing GTs only focus on the first-order neighbors on the constructed similarity graphs, which leads to the limited usage of nodes to generate diverse token sequences, further restricting the potential of tokenized GTs for node classification. To this end, we propose a new method termed SwapGT. SwapGT first introduces a novel token swapping operation based on the characteristics of token sequences that fully leverages the semantic relevance of nodes to generate more informative token sequences. Then, SwapGT leverages a Transformer-based backbone to learn node representations from the generated token sequences. Moreover, SwapGT develops a center alignment loss to constrain the representation learning from multiple token sequences, further enhancing the model performance. Extensive empirical results on various datasets showcase the superiority of SwapGT for node classification.

Rethinking Tokenized Graph Transformers for Node Classification

TL;DR

SwapGT addresses the limited diversity of token sequences in tokenized Graph Transformers for node classification by introducing a token swapping mechanism that expands token neighborhoods beyond 1-hop. It employs a Transformer-based backbone to learn from multiple token sequences derived from both attribute and topology views and uses a center alignment loss to harmonize representations across views. Empirical results on eight datasets show SwapGT achieving state-of-the-art accuracy, with pronounced gains in sparse-label regimes thanks to token augmentation and regularization. The approach advances graph representation learning by enriching token-level context and stabilizing multi-sequence learning for robust node classification.

Abstract

Node tokenized graph Transformers (GTs) have shown promising performance in node classification. The generation of token sequences is the key module in existing tokenized GTs which transforms the input graph into token sequences, facilitating the node representation learning via Transformer. In this paper, we observe that the generations of token sequences in existing GTs only focus on the first-order neighbors on the constructed similarity graphs, which leads to the limited usage of nodes to generate diverse token sequences, further restricting the potential of tokenized GTs for node classification. To this end, we propose a new method termed SwapGT. SwapGT first introduces a novel token swapping operation based on the characteristics of token sequences that fully leverages the semantic relevance of nodes to generate more informative token sequences. Then, SwapGT leverages a Transformer-based backbone to learn node representations from the generated token sequences. Moreover, SwapGT develops a center alignment loss to constrain the representation learning from multiple token sequences, further enhancing the model performance. Extensive empirical results on various datasets showcase the superiority of SwapGT for node classification.

Paper Structure

This paper contains 29 sections, 15 equations, 11 figures, 3 tables, 1 algorithm.

Figures (11)

  • Figure 1: The toy example of token generation on the $k$-NN graph. Previous methods only focus on 1-hop neighborhood to construct a single token sequence. While our method can flexibly select tokens from multi-hop neighborhoods to generate diverse token sequences.
  • Figure 2: The overall framework of SwapGT. First, we generate the initial token sequences from both the attribute view and topology view. Then, we utilize the proposed token swapping operation to generate new token sequences for each target node. These generated token sequences are then fed into a Transformer-based backbone to learn node representations and generate predicted labels. Additionally, a center alignment loss is adopted to further constrain the representations extracted from different token sequences.
  • Figure 3: Illustration of the token swapping, where node 1 is the target node. We first select node 3 and regard the tokens in its token sequences as the candidates. Then we select node 6 from the candidates to swap node 3, and construct the new token sequence.
  • Figure 4: Performances of SwapGT with or without the center alignment loss.
  • Figure 5: Performances of SwapGT with different token sequence generation strategies.
  • ...and 6 more figures