A Survey of Graph Transformers: Architectures, Theories and Applications
Chaohao Yuan, Kangfei Zhao, Ercan Engin Kuruoglu, Liang Wang, Tingyang Xu, Wenbing Huang, Deli Zhao, Hong Cheng, Yu Rong
TL;DR
Graph Transformers systematically integrate graph structure into Transformer architectures to overcome limitations of traditional GNNs, enabling long-range relational modeling. The survey develops a fourfold architectural taxonomy (multi-level graph tokenization, structural positional encoding, structure-aware attention, and GNN-Transformer ensembles) and discusses scalability and geometric equivariance, linking these designs to expressivity via WL-type tests. It surveys diverse applications across molecules, proteins, textual graphs, social networks, traffic, vision, brain, and materials, highlighting datasets, tasks, and pretraining strategies. Theoretical discussions reveal how tokenization and encoding affect expressivity and relate GTs to MPNNs and graph structure learning, providing guidance for future GT design and cross-domain deployment. Overall, GTs hold promise for scientifically grounded graph modeling, with scalable and equivariant variants enabling practical, domain-agnostic applications.
Abstract
Graph Transformers (GTs) have demonstrated a strong capability in modeling graph structures by addressing the intrinsic limitations of graph neural networks (GNNs), such as over-smoothing and over-squashing. Recent studies have proposed diverse architectures, enhanced explainability, and practical applications for Graph Transformers. In light of these rapid developments, we conduct a comprehensive review of Graph Transformers, covering aspects such as their architectures, theoretical foundations, and applications within this survey. We categorize the architecture of Graph Transformers according to their strategies for processing structural information, including graph tokenization, positional encoding, structure-aware attention and model ensemble. Furthermore, from the theoretical perspective, we examine the expressivity of Graph Transformers in various discussed architectures and contrast them with other advanced graph learning algorithms to discover the connections. Furthermore, we provide a summary of the practical applications where Graph Transformers have been utilized, such as molecule, protein, language, vision, traffic, brain and material data. At the end of this survey, we will discuss the current challenges and prospective directions in Graph Transformers for potential future research.
