Table of Contents
Fetching ...

Do Transformers Really Perform Bad for Graph Representation?

Chengxuan Ying, Tianle Cai, Shengjie Luo, Shuxin Zheng, Guolin Ke, Di He, Yanming Shen, Tie-Yan Liu

TL;DR

This work tackles the question of whether Transformers can excel at graph representation. It introduces Graphormer, a Transformer-based model augmented with three graph-structural encodings (centrality, spatial distance, and edge features) and a special [VNode] readout node, enabling powerful graph-level representations. The authors demonstrate that Graphormer can replicate common GNN operations, surpass 1-WL expressiveness, and achieve state-of-the-art results on large-scale benchmarks such as the OGB-LSC PCQM4M-LSC dataset, MolPCBA, MolHIV, and ZINC, including strong transfer from pretraining. The results, ablations, and theoretical insights collectively suggest Transformers are viable for graph tasks when structural information is effectively encoded, and they point to future work on efficiency and domain-specific encodings.

Abstract

The Transformer architecture has become a dominant choice in many domains, such as natural language processing and computer vision. Yet, it has not achieved competitive performance on popular leaderboards of graph-level prediction compared to mainstream GNN variants. Therefore, it remains a mystery how Transformers could perform well for graph representation learning. In this paper, we solve this mystery by presenting Graphormer, which is built upon the standard Transformer architecture, and could attain excellent results on a broad range of graph representation learning tasks, especially on the recent OGB Large-Scale Challenge. Our key insight to utilizing Transformer in the graph is the necessity of effectively encoding the structural information of a graph into the model. To this end, we propose several simple yet effective structural encoding methods to help Graphormer better model graph-structured data. Besides, we mathematically characterize the expressive power of Graphormer and exhibit that with our ways of encoding the structural information of graphs, many popular GNN variants could be covered as the special cases of Graphormer.

Do Transformers Really Perform Bad for Graph Representation?

TL;DR

This work tackles the question of whether Transformers can excel at graph representation. It introduces Graphormer, a Transformer-based model augmented with three graph-structural encodings (centrality, spatial distance, and edge features) and a special [VNode] readout node, enabling powerful graph-level representations. The authors demonstrate that Graphormer can replicate common GNN operations, surpass 1-WL expressiveness, and achieve state-of-the-art results on large-scale benchmarks such as the OGB-LSC PCQM4M-LSC dataset, MolPCBA, MolHIV, and ZINC, including strong transfer from pretraining. The results, ablations, and theoretical insights collectively suggest Transformers are viable for graph tasks when structural information is effectively encoded, and they point to future work on efficiency and domain-specific encodings.

Abstract

The Transformer architecture has become a dominant choice in many domains, such as natural language processing and computer vision. Yet, it has not achieved competitive performance on popular leaderboards of graph-level prediction compared to mainstream GNN variants. Therefore, it remains a mystery how Transformers could perform well for graph representation learning. In this paper, we solve this mystery by presenting Graphormer, which is built upon the standard Transformer architecture, and could attain excellent results on a broad range of graph representation learning tasks, especially on the recent OGB Large-Scale Challenge. Our key insight to utilizing Transformer in the graph is the necessity of effectively encoding the structural information of a graph into the model. To this end, we propose several simple yet effective structural encoding methods to help Graphormer better model graph-structured data. Besides, we mathematically characterize the expressive power of Graphormer and exhibit that with our ways of encoding the structural information of graphs, many popular GNN variants could be covered as the special cases of Graphormer.

Paper Structure

This paper contains 64 sections, 7 equations, 2 figures, 13 tables.

Figures (2)

  • Figure 1: An illustration of our proposed centrality encoding, spatial encoding, and edge encoding in Graphormer.
  • Figure 2: These two graphs cannot be distinguished by 1-WL-test. But the SPD sets, i.e., the SPD from each node to others, are different: The two types of nodes in the left graph have SPD sets $\left\{0, 1, 1, 2, 2, 3\right\}, \left\{0, 1, 1, 1, 2, 2\right\}$ while the nodes in the right graph have SPD sets $\left\{0, 1, 1, 2, 3, 3\right\}, \left\{0, 1, 1, 1, 2, 2\right\}$.