How does over-squashing affect the power of GNNs?
Francesco Di Giovanni, T. Konstantin Rusch, Michael M. Bronstein, Andreea Deac, Marc Lackenby, Siddhartha Mishra, Petar Veličković
TL;DR
This work analyzes how over-squashing limits the expressive power of MPNNs by introducing a Hessian-based pairwise mixing measure that quantifies how well node features can interact under message passing. The authors derive a general bound on mixing that depends on network capacity, via depth $m$ and weight norm $\mathsf{w}$, and on graph topology through the operator $S$ and its higher-order corrections, highlighting the role of commute times. They define over-squashing as the inverse of maximal mixing and introduce a computable proxy $\widetilde{\mathsf{OSQ}}$ to obtain necessary conditions on capacity for learning functions with prescribed mixing; they prove that, in bounded-depth or bounded-weight regimes, achieving high mixing becomes impractical on graphs with large commute times. Experimental validation on synthetic ZINC graphs shows that increasing commute time degrades performance and increases OSQ, while deeper architectures can mitigate these effects, illustrating practical implications and guiding remedies such as graph rewiring or more expressive architectures like Graph Transformers. Overall, the paper provides a rigorous framework linking over-squashing, graph topology, and GNN expressive power, with concrete bounds and empirical confirmation that inform the design of scalable GNNs for long-range relational tasks.
Abstract
Graph Neural Networks (GNNs) are the state-of-the-art model for machine learning on graph-structured data. The most popular class of GNNs operate by exchanging information between adjacent nodes, and are known as Message Passing Neural Networks (MPNNs). Given their widespread use, understanding the expressive power of MPNNs is a key question. However, existing results typically consider settings with uninformative node features. In this paper, we provide a rigorous analysis to determine which function classes of node features can be learned by an MPNN of a given capacity. We do so by measuring the level of pairwise interactions between nodes that MPNNs allow for. This measure provides a novel quantitative characterization of the so-called over-squashing effect, which is observed to occur when a large volume of messages is aggregated into fixed-size vectors. Using our measure, we prove that, to guarantee sufficient communication between pairs of nodes, the capacity of the MPNN must be large enough, depending on properties of the input graph structure, such as commute times. For many relevant scenarios, our analysis results in impossibility statements in practice, showing that over-squashing hinders the expressive power of MPNNs. We validate our theoretical findings through extensive controlled experiments and ablation studies.
