When Does Global Attention Help? A Unified Empirical Study on Atomistic Graph Learning
Arindam Chowdhury, Massimiliano Lupo Pasini
TL;DR
The paper tackles the question of when global attention helps atomistic graph learning and provides a unified, reproducible benchmark to dissect the roles of local message passing, encoders, and global attention. By evaluating four configurations within a HydraGNN-based framework across seven diverse datasets, it shows that encoder-based augmentations robustly improve local-property predictions while fused local–global models yield the clearest benefits for long-range interaction regimes, all under explicit compute-cost considerations. The study delivers practical guidelines: use encoder-augmented MPNNs by default, add moderate global attention for nonlocal tasks, and prefer modest attention budgets to maintain parameter efficiency. This work establishes a principled, replicable benchmark, enabling fair comparisons and informing future method development in atomistic graph learning.
Abstract
Graph neural networks (GNNs) are widely used as surrogates for costly experiments and first-principles simulations to study the behavior of compounds at atomistic scale, and their architectural complexity is constantly increasing to enable the modeling of complex physics. While most recent GNNs combine more traditional message passing neural networks (MPNNs) layers to model short-range interactions with more advanced graph transformers (GTs) with global attention mechanisms to model long-range interactions, it is still unclear when global attention mechanisms provide real benefits over well-tuned MPNN layers due to inconsistent implementations, features, or hyperparameter tuning. We introduce the first unified, reproducible benchmarking framework - built on HydraGNN - that enables seamless switching among four controlled model classes: MPNN, MPNN with chemistry/topology encoders, GPS-style hybrids of MPNN with global attention, and fully fused local - global models with encoders. Using seven diverse open-source datasets for benchmarking across regression and classification tasks, we systematically isolate the contributions of message passing, global attention, and encoder-based feature augmentation. Our study shows that encoder-augmented MPNNs form a robust baseline, while fused local-global models yield the clearest benefits for properties governed by long-range interaction effects. We further quantify the accuracy - compute trade-offs of attention, reporting its overhead in memory. Together, these results establish the first controlled evaluation of global attention in atomistic graph learning and provide a reproducible testbed for future model development.
