Table of Contents
Fetching ...

Short-Range Oversquashing

Yaaqov Mishayev, Yonatan Sverdlov, Tal Amir, Nadav Dym

TL;DR

The paper shows that oversquashing in GNNs is not limited to long-range tasks by introducing the Two-Radius problem, where a bottleneck emerges even at short range with only two MPNN iterations. It proves that solving such tasks requires intermediate feature dimensions that grow with graph size, while empirical results reveal that Transformers solve the problem robustly and MPNNs with virtual nodes do not, highlighting a gap between MPNN bottlenecks and transformer expressivity. The authors disentangle bottleneck and vanishing-gradient mechanisms, demonstrate that existing oversquashing measures fail to predict the Two-Radius bottleneck, and propose Graph Transformers as a more reliable solution for these scenarios. Overall, the Two-Radius framework provides a precise benchmark to study oversquashing and guides architectural choices toward attention-based models that can efficiently propagate information in both short- and long-range settings.

Abstract

Message Passing Neural Networks (MPNNs) are widely used for learning on graphs, but their ability to process long-range information is limited by the phenomenon of oversquashing. This limitation has led some researchers to advocate Graph Transformers as a better alternative, whereas others suggest that it can be mitigated within the MPNN framework, using virtual nodes or other rewiring techniques. In this work, we demonstrate that oversquashing is not limited to long-range tasks, but can also arise in short-range problems. This observation allows us to disentangle two distinct mechanisms underlying oversquashing: (1) the bottleneck phenomenon, which can arise even in low-range settings, and (2) the vanishing gradient phenomenon, which is closely associated with long-range tasks. We further show that the short-range bottleneck effect is not captured by existing explanations for oversquashing, and that adding virtual nodes does not resolve it. In contrast, transformers do succeed in such tasks, positioning them as the more compelling solution to oversquashing, compared to specialized MPNNs.

Short-Range Oversquashing

TL;DR

The paper shows that oversquashing in GNNs is not limited to long-range tasks by introducing the Two-Radius problem, where a bottleneck emerges even at short range with only two MPNN iterations. It proves that solving such tasks requires intermediate feature dimensions that grow with graph size, while empirical results reveal that Transformers solve the problem robustly and MPNNs with virtual nodes do not, highlighting a gap between MPNN bottlenecks and transformer expressivity. The authors disentangle bottleneck and vanishing-gradient mechanisms, demonstrate that existing oversquashing measures fail to predict the Two-Radius bottleneck, and propose Graph Transformers as a more reliable solution for these scenarios. Overall, the Two-Radius framework provides a precise benchmark to study oversquashing and guides architectural choices toward attention-based models that can efficiently propagate information in both short- and long-range settings.

Abstract

Message Passing Neural Networks (MPNNs) are widely used for learning on graphs, but their ability to process long-range information is limited by the phenomenon of oversquashing. This limitation has led some researchers to advocate Graph Transformers as a better alternative, whereas others suggest that it can be mitigated within the MPNN framework, using virtual nodes or other rewiring techniques. In this work, we demonstrate that oversquashing is not limited to long-range tasks, but can also arise in short-range problems. This observation allows us to disentangle two distinct mechanisms underlying oversquashing: (1) the bottleneck phenomenon, which can arise even in low-range settings, and (2) the vanishing gradient phenomenon, which is closely associated with long-range tasks. We further show that the short-range bottleneck effect is not captured by existing explanations for oversquashing, and that adding virtual nodes does not resolve it. In contrast, transformers do succeed in such tasks, positioning them as the more compelling solution to oversquashing, compared to specialized MPNNs.

Paper Structure

This paper contains 26 sections, 3 theorems, 18 equations, 6 figures, 2 tables.

Key Result

Theorem 1

For any $r \geq 1$, the Ring Transfer task with radius $r$ requires at least $r$ iterations of an MPNN. However, there exists an MPNN that solves the task exactly whose node feature dimension is independent of $r$. This also holds if the ring topology is replaced with any other graph.

Figures (6)

  • Figure 1: Illustration of synthetic graph-transfer problems. (a) Tree Neighbors-Match: information is transferred from leaves to a target node through a tree of depth $r$. (b) Ring Transfer: a source and target are connected by two disjoint paths of length $r$. (c) Two-Radius: $n$ sources, $n$ targets, and a single central node. (d) Generalized Two-Radius: $k$ central nodes. Node colors represent source and target identifiers; gray denotes central nodes.
  • Figure 2: Test accuracy comparison across different models on the Two-Radius problem. Performance is evaluated for varying numbers of nodes $n \in \{10, 50, 150, 200\}$ with hidden dimensions of 256 and 1024. Transformer consistently achieves 100% accuracy while MPNN performance degrades as $n$ increases. Error bars indicate the standard error of the mean.
  • Figure 3: (a) Training efficiency comparison between GAT and Transformer on the Two-Radius problem, measured in the number of epochs required to achieve 92% accuracy. (b) Effect of the number of central nodes $k$ on GCN performance for the Two-Radius problem with $n=100$. Accuracy remains poor regardless of $k$, demonstrating that increasing graph connectivity via additional central nodes does not resolve the bottleneck phenomenon. Error bars in (b) (barely visible due to low variance) indicate the standard error of the mean.
  • Figure 4: GCN's accuracy deteriorates as the problem radius increases, which is correlated with vanishing gradients. On the other hand, problems with bottlenecks are also difficult for MPNNs, but they do not suffer from vanishing gradients or oversquashing.
  • Figure 5: Comparison of GCN performance with and without virtual nodes (VN) on the Two-Radius problem. While virtual nodes provide modest improvements, performance still degrades significantly as $n$ increases, indicating that VNs do not fully address the bottleneck in short-range oversquashing. Error bars indicate the standard error of the mean.
  • ...and 1 more figures

Theorems & Definitions (7)

  • Theorem 1
  • proof : proof idea
  • Theorem 2
  • proof
  • Theorem 3
  • proof : Proof of Theorem \ref{['thm:constant_dim']}
  • proof : Proof of Theorem \ref{['thm:cheeger']}