On the Bottleneck of Graph Neural Networks and its Practical Implications

Uri Alon; Eran Yahav

On the Bottleneck of Graph Neural Networks and its Practical Implications

Uri Alon, Eran Yahav

TL;DR

This work identifies over-squashing as a fundamental bottleneck in graph neural networks, where the exponential growth of a node's receptive field with depth leads to fixed-size vectors discarding crucial long-range information. Through a controlled synthetic NeighborsMatch benchmark and extensive real-world datasets (QM9, ENZYMES, NCI1, VarMisuse), the authors show that breaking the bottleneck with a simple fully-adjacent layer dramatically improves performance without additional tuning, especially on long-range tasks. They provide theoretical lower bounds on hidden sizes and empirical analyses, revealing that conventional GNNs (GCN/GIN) are more prone to over-squashing than attention-based or gating-based variants, and that increased hidden size alone is insufficient to overcome the bottleneck. The findings have practical implications across chemistry, biology, and programming analysis, pointing to FA-like mechanisms as a robust, low-cost mitigation and motivating future exploration of long-range information propagation strategies in GNNs.

Abstract

Since the proposal of the graph neural network (GNN) by Gori et al. (2005) and Scarselli et al. (2008), one of the major problems in training GNNs was their struggle to propagate information between distant nodes in the graph. We propose a new explanation for this problem: GNNs are susceptible to a bottleneck when aggregating messages across a long path. This bottleneck causes the over-squashing of exponentially growing information into fixed-size vectors. As a result, GNNs fail to propagate messages originating from distant nodes and perform poorly when the prediction task depends on long-range interaction. In this paper, we highlight the inherent problem of over-squashing in GNNs: we demonstrate that the bottleneck hinders popular GNNs from fitting long-range signals in the training data; we further show that GNNs that absorb incoming edges equally, such as GCN and GIN, are more susceptible to over-squashing than GAT and GGNN; finally, we show that prior work, which extensively tuned GNN models of long-range problems, suffers from over-squashing, and that breaking the bottleneck improves their state-of-the-art results without any tuning or additional weights. Our code is available at https://github.com/tech-srl/bottleneck/ .

On the Bottleneck of Graph Neural Networks and its Practical Implications

TL;DR

Abstract

On the Bottleneck of Graph Neural Networks and its Practical Implications

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (6)