Short-Range Oversquashing
Yaaqov Mishayev, Yonatan Sverdlov, Tal Amir, Nadav Dym
TL;DR
The paper shows that oversquashing in GNNs is not limited to long-range tasks by introducing the Two-Radius problem, where a bottleneck emerges even at short range with only two MPNN iterations. It proves that solving such tasks requires intermediate feature dimensions that grow with graph size, while empirical results reveal that Transformers solve the problem robustly and MPNNs with virtual nodes do not, highlighting a gap between MPNN bottlenecks and transformer expressivity. The authors disentangle bottleneck and vanishing-gradient mechanisms, demonstrate that existing oversquashing measures fail to predict the Two-Radius bottleneck, and propose Graph Transformers as a more reliable solution for these scenarios. Overall, the Two-Radius framework provides a precise benchmark to study oversquashing and guides architectural choices toward attention-based models that can efficiently propagate information in both short- and long-range settings.
Abstract
Message Passing Neural Networks (MPNNs) are widely used for learning on graphs, but their ability to process long-range information is limited by the phenomenon of oversquashing. This limitation has led some researchers to advocate Graph Transformers as a better alternative, whereas others suggest that it can be mitigated within the MPNN framework, using virtual nodes or other rewiring techniques. In this work, we demonstrate that oversquashing is not limited to long-range tasks, but can also arise in short-range problems. This observation allows us to disentangle two distinct mechanisms underlying oversquashing: (1) the bottleneck phenomenon, which can arise even in low-range settings, and (2) the vanishing gradient phenomenon, which is closely associated with long-range tasks. We further show that the short-range bottleneck effect is not captured by existing explanations for oversquashing, and that adding virtual nodes does not resolve it. In contrast, transformers do succeed in such tasks, positioning them as the more compelling solution to oversquashing, compared to specialized MPNNs.
