VecFormer: Towards Efficient and Generalizable Graph Transformer with Graph Token Attention

Jingbo Zhou; Jun Xia; Siyuan Li; Yunfan Liu; Wenjun Wang; Yufei Huang; Changxi Chi; Mutian Hong; Zhuoli Ouyang; Shu Wang; Zhongqi Wang; Xingyu Wu; Chang Yu; Stan Z. Li

VecFormer: Towards Efficient and Generalizable Graph Transformer with Graph Token Attention

Jingbo Zhou, Jun Xia, Siyuan Li, Yunfan Liu, Wenjun Wang, Yufei Huang, Changxi Chi, Mutian Hong, Zhuoli Ouyang, Shu Wang, Zhongqi Wang, Xingyu Wu, Chang Yu, Stan Z. Li

TL;DR

VecFormer is proposed, an efficient and highly generalizable model for node classification, particularly under OOD settings, that outperforms the existing Graph Transformer in both performance and speed.

Abstract

Graph Transformer has demonstrated impressive capabilities in the field of graph representation learning. However, existing approaches face two critical challenges: (1) most models suffer from exponentially increasing computational complexity, making it difficult to scale to large graphs; (2) attention mechanisms based on node-level operations limit the flexibility of the model and result in poor generalization performance in out-of-distribution (OOD) scenarios. To address these issues, we propose \textbf{VecFormer} (the \textbf{Vec}tor Quantized Graph Trans\textbf{former}), an efficient and highly generalizable model for node classification, particularly under OOD settings. VecFormer adopts a two-stage training paradigm. In the first stage, two codebooks are used to reconstruct the node features and the graph structure, aiming to learn the rich semantic \texttt{Graph Codes}. In the second stage, attention mechanisms are performed at the \texttt{Graph Token} level based on the transformed cross codebook, reducing computational complexity while enhancing the model's generalization capability. Extensive experiments on datasets of various sizes demonstrate that VecFormer outperforms the existing Graph Transformer in both performance and speed.

VecFormer: Towards Efficient and Generalizable Graph Transformer with Graph Token Attention

TL;DR

Abstract

Paper Structure (22 sections, 15 equations, 7 figures, 7 tables)

This paper contains 22 sections, 15 equations, 7 figures, 7 tables.

Introduction
Related Work
Graph Transformer
OOD Generalization on Graph
Vector Quantization in Graph Neural Networks
Preliminary
Graph Transformer
Vector Quantization
VecFormer
Graph Codebook Training
Graph Transformer Finetuning
Model Analysis
Experiments
Node Classification Task (RQ1)
OOD Generalization Scenario (RQ2)
...and 7 more sections

Figures (7)

Figure 1: Standard deviation of attention weights and performance evaluation of Node Level Attention and Graph Token Attention in OOD setting.
Figure 2: Illustration of the VecFormer's architecture and the training process.
Figure 3: Experimental results(%) for node classification task on heterophily graph datasets Chameleon and Squirrel.
Figure 4: The training time per epoch and GPU memory usage of three linear-complexity graph transformers on graphs with varying numbers of nodes.
Figure 5: Reconstruction losses of different types and the total loss under varying numbers of Graph Codes.
...and 2 more figures

VecFormer: Towards Efficient and Generalizable Graph Transformer with Graph Token Attention

TL;DR

Abstract

VecFormer: Towards Efficient and Generalizable Graph Transformer with Graph Token Attention

Authors

TL;DR

Abstract

Table of Contents

Figures (7)