Table of Contents
Fetching ...

On the Bottleneck of Graph Neural Networks and its Practical Implications

Uri Alon, Eran Yahav

TL;DR

This work identifies over-squashing as a fundamental bottleneck in graph neural networks, where the exponential growth of a node's receptive field with depth leads to fixed-size vectors discarding crucial long-range information. Through a controlled synthetic NeighborsMatch benchmark and extensive real-world datasets (QM9, ENZYMES, NCI1, VarMisuse), the authors show that breaking the bottleneck with a simple fully-adjacent layer dramatically improves performance without additional tuning, especially on long-range tasks. They provide theoretical lower bounds on hidden sizes and empirical analyses, revealing that conventional GNNs (GCN/GIN) are more prone to over-squashing than attention-based or gating-based variants, and that increased hidden size alone is insufficient to overcome the bottleneck. The findings have practical implications across chemistry, biology, and programming analysis, pointing to FA-like mechanisms as a robust, low-cost mitigation and motivating future exploration of long-range information propagation strategies in GNNs.

Abstract

Since the proposal of the graph neural network (GNN) by Gori et al. (2005) and Scarselli et al. (2008), one of the major problems in training GNNs was their struggle to propagate information between distant nodes in the graph. We propose a new explanation for this problem: GNNs are susceptible to a bottleneck when aggregating messages across a long path. This bottleneck causes the over-squashing of exponentially growing information into fixed-size vectors. As a result, GNNs fail to propagate messages originating from distant nodes and perform poorly when the prediction task depends on long-range interaction. In this paper, we highlight the inherent problem of over-squashing in GNNs: we demonstrate that the bottleneck hinders popular GNNs from fitting long-range signals in the training data; we further show that GNNs that absorb incoming edges equally, such as GCN and GIN, are more susceptible to over-squashing than GAT and GGNN; finally, we show that prior work, which extensively tuned GNN models of long-range problems, suffers from over-squashing, and that breaking the bottleneck improves their state-of-the-art results without any tuning or additional weights. Our code is available at https://github.com/tech-srl/bottleneck/ .

On the Bottleneck of Graph Neural Networks and its Practical Implications

TL;DR

This work identifies over-squashing as a fundamental bottleneck in graph neural networks, where the exponential growth of a node's receptive field with depth leads to fixed-size vectors discarding crucial long-range information. Through a controlled synthetic NeighborsMatch benchmark and extensive real-world datasets (QM9, ENZYMES, NCI1, VarMisuse), the authors show that breaking the bottleneck with a simple fully-adjacent layer dramatically improves performance without additional tuning, especially on long-range tasks. They provide theoretical lower bounds on hidden sizes and empirical analyses, revealing that conventional GNNs (GCN/GIN) are more prone to over-squashing than attention-based or gating-based variants, and that increased hidden size alone is insufficient to overcome the bottleneck. The findings have practical implications across chemistry, biology, and programming analysis, pointing to FA-like mechanisms as a robust, low-cost mitigation and motivating future exploration of long-range information propagation strategies in GNNs.

Abstract

Since the proposal of the graph neural network (GNN) by Gori et al. (2005) and Scarselli et al. (2008), one of the major problems in training GNNs was their struggle to propagate information between distant nodes in the graph. We propose a new explanation for this problem: GNNs are susceptible to a bottleneck when aggregating messages across a long path. This bottleneck causes the over-squashing of exponentially growing information into fixed-size vectors. As a result, GNNs fail to propagate messages originating from distant nodes and perform poorly when the prediction task depends on long-range interaction. In this paper, we highlight the inherent problem of over-squashing in GNNs: we demonstrate that the bottleneck hinders popular GNNs from fitting long-range signals in the training data; we further show that GNNs that absorb incoming edges equally, such as GCN and GIN, are more susceptible to over-squashing than GAT and GGNN; finally, we show that prior work, which extensively tuned GNN models of long-range problems, suffers from over-squashing, and that breaking the bottleneck improves their state-of-the-art results without any tuning or additional weights. Our code is available at https://github.com/tech-srl/bottleneck/ .

Paper Structure

This paper contains 25 sections, 5 equations, 6 figures, 10 tables.

Figures (6)

  • Figure 1: The bottleneck that existed in RNN seq2seq models (before attention) is strictly more harmful in GNNs: information from a node's exponentially-growing receptive field is compressed into a fixed-size vector. Black arrows are graph edges; red curved arrows illustrate information flow.
  • Figure 2: The NeighborsMatch: green nodes (, , ) have blue neighbors () and an alphabetical label. The goal is to predict the label (A, B, or C) of the green node that has the same number of blue neighbors as the target node () in the same graph. In this example, the correct label is C, because the target node has two blue neighbors, like the node marked with C in the same graph.
  • Figure 2: Average accuracy (30 runs$\pm$stdev) on the biological datasets. ${\dagger}$ -- previously reported by errica2020fair.
  • Figure 3: Accuracy across problem radius (tree depth) in the NeighborsMatch problem. Over-squashing starts to affect GCN and GIN even at $r=4$.
  • Figure 4: Combinatorial and empirical lower bounds of the model dimension given the problem radius.
  • ...and 1 more figures