On Oversquashing in Graph Neural Networks Through the Lens of Dynamical Systems

Alessio Gravina; Moshe Eliasof; Claudio Gallicchio; Davide Bacciu; Carola-Bibiane Schönlieb

On Oversquashing in Graph Neural Networks Through the Lens of Dynamical Systems

Alessio Gravina, Moshe Eliasof, Claudio Gallicchio, Davide Bacciu, Carola-Bibiane Schönlieb

TL;DR

Oversquashing limits information transfer in MPNNs across long-range graph interactions. The paper reframes this as a non-dissipative dynamical system problem and introduces SWAN, a space-weight antisymmetric DE-GNN, to achieve global and local non-dissipativity and a constant information flow rate. Theoretical analysis shows that SWAN's Jacobians have zero real parts, yielding non-dissipative propagation, while experiments across graph transfer, graph property prediction, and long-range benchmarks demonstrate strong long-range performance with linear computing complexity. This approach provides a principled, scalable mechanism to mitigate oversquashing and broadens the toolbox for long-range graph learning, offering competitive performance without resorting to dense or multi-hop architectures.

Abstract

A common problem in Message-Passing Neural Networks is oversquashing -- the limited ability to facilitate effective information flow between distant nodes. Oversquashing is attributed to the exponential decay in information transmission as node distances increase. This paper introduces a novel perspective to address oversquashing, leveraging dynamical systems properties of global and local non-dissipativity, that enable the maintenance of a constant information flow rate. We present SWAN, a uniquely parameterized GNN model with antisymmetry both in space and weight domains, as a means to obtain non-dissipativity. Our theoretical analysis asserts that by implementing these properties, SWAN offers an enhanced ability to transmit information over extended distances. Empirical evaluations on synthetic and real-world benchmarks that emphasize long-range interactions validate the theoretical understanding of SWAN, and its ability to mitigate oversquashing.

On Oversquashing in Graph Neural Networks Through the Lens of Dynamical Systems

TL;DR

Abstract

Paper Structure (38 sections, 3 theorems, 31 equations, 7 figures, 9 tables)

This paper contains 38 sections, 3 theorems, 31 equations, 7 figures, 9 tables.

Introduction
Preliminaries
Oversquashing
GNNs Inspired by Differential-Equations
Mathematical Background
SWAN: Space-Weight Antisymmetric GNN
Node-wise Analysis of SWAN
Graph-wise Analysis of SWAN
The Benefit of Spatial Antisymmetry
Experiments
Graph Transfer
Graph Property Prediction
Long-Range Graph Benchmark
Ablation Study
Related Work
...and 23 more sections

Key Result

Theorem 3.1

The information propagation rate among the graph nodes $\cal{V}$ is constant, $c$, independently of time $t$:

Figures (7)

Figure 1: An illustration of the ability of Global and Local Non-Dissipativity in SWAN (e) to propagate information to distant nodes, from source (a) to the target (b). Other dynamics, such as diffusion (c) cannot achieve this behavior, while Local Non-Dissipativity (d) offers a limited effect.
Figure 2: The difference between non-dissipative and dissipative behaviors. With global (i.e., graph-wise) and local (i.e., node-wise) non-dissipative behavior (a), information is propagated between any pair of nodes with a viable path in the graph. Therefore, such a behavior increases the long-range effectiveness of the model. A model exhibiting local non-dissipative behavior (b) enhances only the long-term memory capacity of individual nodes. A model demonstrating dissipative behavior (c) exhibits a convergence of node features toward non-informative values.
Figure 3: Information transfer performance on (a) Line, (b) Ring, and (c) Crossed-Ring graphs. Non-dissipative methods like ADGN and SWAN allow for the accurate transfer of information.
Figure 4: Line, ring, and crossed-ring graphs where the distance between source and target nodes is equal to 5. Nodes marked with "S" are source nodes, while the nodes with a "T" are target nodes.
Figure 5: Erdős–Rényi, Barabasi-Albert, grid, caveman, tree, ladder, line, star, caterpillar, and lobster graphs where the number of nodes is equal to 35.
...and 2 more figures

Theorems & Definitions (6)

Theorem 3.1: SWAN has a constant global information propagation rate
proof
Theorem 3.2: Time Decaying Propagation in Diffusion GNNs
Theorem 3.3: SWAN sensitivity upper bound
proof
proof

On Oversquashing in Graph Neural Networks Through the Lens of Dynamical Systems

TL;DR

Abstract

On Oversquashing in Graph Neural Networks Through the Lens of Dynamical Systems

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (7)

Theorems & Definitions (6)