Table of Contents
Fetching ...

Benchmarking Positional Encodings for GNNs and Graph Transformers

Florian Grötschla, Jiaqing Xie, Roger Wattenhofer

TL;DR

This work addresses how to evaluate Positional Encodings (PEs) for Graph Neural Networks and Graph Transformers independently of architectural innovations. It proposes a unified benchmarking framework and conducts an expansive study across 8 architectures, 9 PEs, and 10 datasets, totaling over 500 configurations. The key finding is that higher theoretical expressiveness, as captured by WL-based metrics, does not consistently translate into improved downstream performance; some expressive encodings can even degrade results on real-world tasks. The results highlight task-dependent PE effectiveness, reveal simple configurations that rival state-of-the-art methods, and demonstrate that sparse attention can match full attention when paired with suitable PEs. The authors also provide an open-source benchmark to facilitate reproducible, future evaluation of PEs in graph learning.

Abstract

Positional Encodings (PEs) are essential for injecting structural information into Graph Neural Networks (GNNs), particularly Graph Transformers, yet their empirical impact remains insufficiently understood. We introduce a unified benchmarking framework that decouples PEs from architectural choices, enabling a fair comparison across 8 GNN and Transformer models, 9 PEs, and 10 synthetic and real-world datasets. Across more than 500 model-PE-dataset configurations, we find that commonly used expressiveness proxies, including Weisfeiler-Lehman distinguishability, do not reliably predict downstream performance. In particular, highly expressive PEs frequently fail to improve, and can even degrade performance on real-world tasks. At the same time, we identify several simple and previously overlooked model-PE combinations that match or outperform recent state-of-the-art methods. Our results demonstrate the strong task-dependence of PEs and underscore the need for empirical validation beyond theoretical expressiveness. To support reproducible research, we release an open-source benchmarking framework for evaluating PEs for graph learning tasks.

Benchmarking Positional Encodings for GNNs and Graph Transformers

TL;DR

This work addresses how to evaluate Positional Encodings (PEs) for Graph Neural Networks and Graph Transformers independently of architectural innovations. It proposes a unified benchmarking framework and conducts an expansive study across 8 architectures, 9 PEs, and 10 datasets, totaling over 500 configurations. The key finding is that higher theoretical expressiveness, as captured by WL-based metrics, does not consistently translate into improved downstream performance; some expressive encodings can even degrade results on real-world tasks. The results highlight task-dependent PE effectiveness, reveal simple configurations that rival state-of-the-art methods, and demonstrate that sparse attention can match full attention when paired with suitable PEs. The authors also provide an open-source benchmark to facilitate reproducible, future evaluation of PEs in graph learning.

Abstract

Positional Encodings (PEs) are essential for injecting structural information into Graph Neural Networks (GNNs), particularly Graph Transformers, yet their empirical impact remains insufficiently understood. We introduce a unified benchmarking framework that decouples PEs from architectural choices, enabling a fair comparison across 8 GNN and Transformer models, 9 PEs, and 10 synthetic and real-world datasets. Across more than 500 model-PE-dataset configurations, we find that commonly used expressiveness proxies, including Weisfeiler-Lehman distinguishability, do not reliably predict downstream performance. In particular, highly expressive PEs frequently fail to improve, and can even degrade performance on real-world tasks. At the same time, we identify several simple and previously overlooked model-PE combinations that match or outperform recent state-of-the-art methods. Our results demonstrate the strong task-dependence of PEs and underscore the need for empirical validation beyond theoretical expressiveness. To support reproducible research, we release an open-source benchmarking framework for evaluating PEs for graph learning tasks.

Paper Structure

This paper contains 25 sections, 2 equations, 9 figures, 7 tables.

Figures (9)

  • Figure 1: Conceptual overview of our benchmarking framework for PEs in GNNs and Graph Transformers. Our empirical results show that practical performance does not always align with theoretical expressiveness, challenging conventional assumptions in the literature. We identify new best-performing configurations by systematically exploring combinations across real and synthetic benchmarks on a wide variety of PEs, models and datasets.
  • Figure 2: Performance comparison of target metrics across selected datasets from BenchmarkingGNNs. The boxplots illustrate the performance range for all models included in the study, with whiskers representing the minimum and maximum performance observed. Notably, RRWP consistently achieves the best results, whereas certain PEs, such as SignNet on CIFAR10, can sometimes decrease performance relative to the baseline without PEs.
  • Figure 3: Mean performance of different Positional Encodings on more datasets from the BenchmarkingGNNs.
  • Figure 4: Performance comparison of target metrics across selected datasets from the Long-Range Graph Benchmark. The boxplots illustrate the performance range of all models included in the study, with whiskers indicating the minimum and maximum performance observed. Plots for the remaining datasets are provided in Figure \ref{['fig:improvement_plot_complete_mean_lrgb']}.
  • Figure 5: Mean performance of different Positional Encodings on more datasets from the Long Range Graph Benchmark.
  • ...and 4 more figures