Comparing Graph Transformers via Positional Encodings

Mitchell Black; Zhengchao Wan; Gal Mishne; Amir Nayyeri; Yusu Wang

Comparing Graph Transformers via Positional Encodings

Mitchell Black, Zhengchao Wan, Gal Mishne, Amir Nayyeri, Yusu Wang

TL;DR

This work develops a theoretical framework to compare graph transformers with absolute and relative positional encodings (APE-GTs and RPE-GTs). It proves that APEs and RPEs can be exchanged without loss of distinguishing power for unfeatured graphs, while revealing that RPEs may outperform APEs when node features are present. The authors introduce RPE-augWL and connect RPEs to WL variants and 2-EGNs, enabling rigorous comparisons across encodings and guiding future PE design. They also provide extensive case studies of SPE, resistance distance, and other common encodings, showing practical implications for graph learning tasks and highlighting when converting between encodings is beneficial or detrimental. Overall, the results offer principled guidance for selecting and designing positional encodings in graph transformers, with implications for both theory and practice.

Abstract

The distinguishing power of graph transformers is closely tied to the choice of positional encoding: features used to augment the base transformer with information about the graph. There are two primary types of positional encoding: absolute positional encodings (APEs) and relative positional encodings (RPEs). APEs assign features to each node and are given as input to the transformer. RPEs instead assign a feature to each pair of nodes, e.g., graph distance, and are used to augment the attention block. A priori, it is unclear which method is better for maximizing the power of the resulting graph transformer. In this paper, we aim to understand the relationship between these different types of positional encodings. Interestingly, we show that graph transformers using APEs and RPEs are equivalent in terms of distinguishing power. In particular, we demonstrate how to interchange APEs and RPEs while maintaining their distinguishing power in terms of graph transformers. Based on our theoretical results, we provide a study on several APEs and RPEs (including the resistance distance and the recently introduced stable and expressive positional encoding (SPE)) and compare their distinguishing power in terms of transformers. We believe our work will help navigate the huge number of choices of positional encoding and will provide guidance on the future design of positional encodings for graph transformers.

Comparing Graph Transformers via Positional Encodings

TL;DR

Abstract

Paper Structure (53 sections, 68 theorems, 71 equations, 9 figures, 5 tables)

This paper contains 53 sections, 68 theorems, 71 equations, 9 figures, 5 tables.

Introduction
Positional Encodings and Graph Transformers
Positional Encodings
Graph Transformers
Properties of RPEs
Diagonally-Aware RPEs.
Asymmetric RPEs.
Comparison of APE-GT and RPE-GT
Distinguishing Power: PEs vs. Graph Transformers
APE vs. APE-GT.
RPE vs. RPE-GT.
Main Results: APE vs RPE Transformers
Restrictions and Implications.
Comparing Graph Transformers with Different Positional Encodings
Resistance Distance, Spectral Kernels, and SPE
...and 38 more sections

Key Result

Proposition 2.5

Let $\psi$ be an RPE. The diagonal augmentation $D^{\psi}$ is diagonally-aware.

Figures (9)

Figure 1: Illustration of main results. Arrows denote non-decreasing in distinguishing power. Our main results are the two red arrows on the left. The proofs of the two theorems are illustrated by other parts of the diagram. Our contributions are in bold. The two-head arrow from RPE-2-WL to APEs indicates that the non-decreasing property only holds for unfeatured graphs.
Figure 2: Hierarchy of PEs. The arrows indicate that the corresponding positional encoding is less strong than the one it points to in terms of distinguishing power. The two-head arrow indicates that the non-decreasing property only holds for unfeatured graphs. The dotted arrow between SPD and RD refers to some partial evidence (cf. \ref{['thm:cut edge']}) that RD is stronger than SPD in some respects; however, it is an open question how the two compare as RPEs.
Figure 3: Left: $G$. Right: $H$
Figure 4: Decomposition of $V_G\times V_G$.
Figure 5: Left: $(G,X_G)$. Right: $(H,X_H)$.
...and 4 more figures

Theorems & Definitions (129)

Definition 2.1
Definition 2.2: Graph Transformers
Definition 2.3
Definition 2.4
Proposition 2.5
Definition 2.6: Pseudo-symmetric RPEs
Lemma 2.7: Pseudo-symmetric augmentation
Lemma 3.0: Equivalence of APEs and APE-GT
Definition 3.1
Definition 3.2
...and 119 more

Comparing Graph Transformers via Positional Encodings

TL;DR

Abstract

Comparing Graph Transformers via Positional Encodings

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (9)

Theorems & Definitions (129)