Table of Contents
Fetching ...

Attending to Graph Transformers

Luis Müller, Mikhail Galkin, Christopher Morris, Ladislav Rampášek

TL;DR

Graph transformers (GTs) offer a scalable alternative to traditional graph neural networks by using attention over graph-structured tokens. The paper presents a taxonomy of GT architectures across encodings, input features, tokenization, and propagation, and connects these design choices to theoretical notions such as $1$-WL, $k$-IGNs, and graph isomorphism. Empirically, the authors show that structural and positional encodings significantly influence GT expressivity and performance, with GTs outperforming certain GNN baselines on heterophilic graphs while still facing challenges in scalability and long-range information propagation. The work provides a practical handbook for GT design and highlights open challenges, including principled encoding design, scalable architectures, and interpretability, to guide future research in graph learning.

Abstract

Recently, transformer architectures for graphs emerged as an alternative to established techniques for machine learning with graphs, such as (message-passing) graph neural networks. So far, they have shown promising empirical results, e.g., on molecular prediction datasets, often attributed to their ability to circumvent graph neural networks' shortcomings, such as over-smoothing and over-squashing. Here, we derive a taxonomy of graph transformer architectures, bringing some order to this emerging field. We overview their theoretical properties, survey structural and positional encodings, and discuss extensions for important graph classes, e.g., 3D molecular graphs. Empirically, we probe how well graph transformers can recover various graph properties, how well they can deal with heterophilic graphs, and to what extent they prevent over-squashing. Further, we outline open challenges and research direction to stimulate future work. Our code is available at https://github.com/luis-mueller/probing-graph-transformers.

Attending to Graph Transformers

TL;DR

Graph transformers (GTs) offer a scalable alternative to traditional graph neural networks by using attention over graph-structured tokens. The paper presents a taxonomy of GT architectures across encodings, input features, tokenization, and propagation, and connects these design choices to theoretical notions such as -WL, -IGNs, and graph isomorphism. Empirically, the authors show that structural and positional encodings significantly influence GT expressivity and performance, with GTs outperforming certain GNN baselines on heterophilic graphs while still facing challenges in scalability and long-range information propagation. The work provides a practical handbook for GT design and highlights open challenges, including principled encoding design, scalable architectures, and interpretability, to guide future research in graph learning.

Abstract

Recently, transformer architectures for graphs emerged as an alternative to established techniques for machine learning with graphs, such as (message-passing) graph neural networks. So far, they have shown promising empirical results, e.g., on molecular prediction datasets, often attributed to their ability to circumvent graph neural networks' shortcomings, such as over-smoothing and over-squashing. Here, we derive a taxonomy of graph transformer architectures, bringing some order to this emerging field. We overview their theoretical properties, survey structural and positional encodings, and discuss extensions for important graph classes, e.g., 3D molecular graphs. Empirically, we probe how well graph transformers can recover various graph properties, how well they can deal with heterophilic graphs, and to what extent they prevent over-squashing. Further, we outline open challenges and research direction to stimulate future work. Our code is available at https://github.com/luis-mueller/probing-graph-transformers.
Paper Structure (40 sections, 7 equations, 3 figures, 5 tables)

This paper contains 40 sections, 7 equations, 3 figures, 5 tables.

Figures (3)

  • Figure 1: Categorization of graph transformers along four main categories with representative architectures: Encodings; see \ref{['sec:pos_enc']}, input features; see \ref{['sec:feat']}, tokens; see \ref{['sec:tok']}, propagation; see \ref{['sec:propagation']}. See also \ref{['fig:architecture_overview']} detailing how the different branches translate into changes to the original transformer.
  • Figure 2: Average train accuracy over ten random seeds of a GT on the NeighborsMatch dataset, compared to models from Alon2021.
  • Figure 3: Overview of how the different branches of our taxonomy affect the original transformer architecture.