Table of Contents
Fetching ...

A Survey of Graph Transformers: Architectures, Theories and Applications

Chaohao Yuan, Kangfei Zhao, Ercan Engin Kuruoglu, Liang Wang, Tingyang Xu, Wenbing Huang, Deli Zhao, Hong Cheng, Yu Rong

TL;DR

Graph Transformers systematically integrate graph structure into Transformer architectures to overcome limitations of traditional GNNs, enabling long-range relational modeling. The survey develops a fourfold architectural taxonomy (multi-level graph tokenization, structural positional encoding, structure-aware attention, and GNN-Transformer ensembles) and discusses scalability and geometric equivariance, linking these designs to expressivity via WL-type tests. It surveys diverse applications across molecules, proteins, textual graphs, social networks, traffic, vision, brain, and materials, highlighting datasets, tasks, and pretraining strategies. Theoretical discussions reveal how tokenization and encoding affect expressivity and relate GTs to MPNNs and graph structure learning, providing guidance for future GT design and cross-domain deployment. Overall, GTs hold promise for scientifically grounded graph modeling, with scalable and equivariant variants enabling practical, domain-agnostic applications.

Abstract

Graph Transformers (GTs) have demonstrated a strong capability in modeling graph structures by addressing the intrinsic limitations of graph neural networks (GNNs), such as over-smoothing and over-squashing. Recent studies have proposed diverse architectures, enhanced explainability, and practical applications for Graph Transformers. In light of these rapid developments, we conduct a comprehensive review of Graph Transformers, covering aspects such as their architectures, theoretical foundations, and applications within this survey. We categorize the architecture of Graph Transformers according to their strategies for processing structural information, including graph tokenization, positional encoding, structure-aware attention and model ensemble. Furthermore, from the theoretical perspective, we examine the expressivity of Graph Transformers in various discussed architectures and contrast them with other advanced graph learning algorithms to discover the connections. Furthermore, we provide a summary of the practical applications where Graph Transformers have been utilized, such as molecule, protein, language, vision, traffic, brain and material data. At the end of this survey, we will discuss the current challenges and prospective directions in Graph Transformers for potential future research.

A Survey of Graph Transformers: Architectures, Theories and Applications

TL;DR

Graph Transformers systematically integrate graph structure into Transformer architectures to overcome limitations of traditional GNNs, enabling long-range relational modeling. The survey develops a fourfold architectural taxonomy (multi-level graph tokenization, structural positional encoding, structure-aware attention, and GNN-Transformer ensembles) and discusses scalability and geometric equivariance, linking these designs to expressivity via WL-type tests. It surveys diverse applications across molecules, proteins, textual graphs, social networks, traffic, vision, brain, and materials, highlighting datasets, tasks, and pretraining strategies. Theoretical discussions reveal how tokenization and encoding affect expressivity and relate GTs to MPNNs and graph structure learning, providing guidance for future GT design and cross-domain deployment. Overall, GTs hold promise for scientifically grounded graph modeling, with scalable and equivariant variants enabling practical, domain-agnostic applications.

Abstract

Graph Transformers (GTs) have demonstrated a strong capability in modeling graph structures by addressing the intrinsic limitations of graph neural networks (GNNs), such as over-smoothing and over-squashing. Recent studies have proposed diverse architectures, enhanced explainability, and practical applications for Graph Transformers. In light of these rapid developments, we conduct a comprehensive review of Graph Transformers, covering aspects such as their architectures, theoretical foundations, and applications within this survey. We categorize the architecture of Graph Transformers according to their strategies for processing structural information, including graph tokenization, positional encoding, structure-aware attention and model ensemble. Furthermore, from the theoretical perspective, we examine the expressivity of Graph Transformers in various discussed architectures and contrast them with other advanced graph learning algorithms to discover the connections. Furthermore, we provide a summary of the practical applications where Graph Transformers have been utilized, such as molecule, protein, language, vision, traffic, brain and material data. At the end of this survey, we will discuss the current challenges and prospective directions in Graph Transformers for potential future research.

Paper Structure

This paper contains 52 sections, 48 equations, 2 figures, 2 tables.

Figures (2)

  • Figure 1: The overview of the architecture of Graph Transformers. The general GT part outlines the various methodologies employed to incorporate structural priors within GTs, including multi-level tokenization, positional encoding, modifying attention matrix and ensemble with GNNs. Other four parts delineate how these methodologies are applied to GT. Methods in the parentheses are representative implementations in their corresponding taxonomies.
  • Figure 2: Overview of the applications of graph transformers.