Over-squashing in Spatiotemporal Graph Neural Networks
Ivan Marisca, Jacob Bamberger, Cesare Alippi, Michael M. Bronstein
TL;DR
This paper formalizes spatiotemporal over-squashing in STGNNs, showing that the temporal axis introduces new bottlenecks and that causal convolutions can bias information toward temporally distant inputs. It develops a theoretical framework that factorizes propagation into independent spatial and temporal components, and provides bounds that separate model parameters from topological structure. The authors introduce spatiotemporal designs (notably mptcn) and analyze time-and-space versus time-then-space budgets, proving that both paradigms are equally susceptible to oversquashing and that mitigating both dimensions is necessary. They propose temporal graph rewiring and row-normalization as practical mitigation strategies, and validate their theory with synthetic tasks and real-world forecasting benchmarks, offering principled guidance for robust and scalable STGNNs.
Abstract
Graph Neural Networks (GNNs) have achieved remarkable success across various domains. However, recent theoretical advances have identified fundamental limitations in their information propagation capabilities, such as over-squashing, where distant nodes fail to effectively exchange information. While extensively studied in static contexts, this issue remains unexplored in Spatiotemporal GNNs (STGNNs), which process sequences associated with graph nodes. Nonetheless, the temporal dimension amplifies this challenge by increasing the information that must be propagated. In this work, we formalize the spatiotemporal over-squashing problem and demonstrate its distinct characteristics compared to the static case. Our analysis reveals that, counterintuitively, convolutional STGNNs favor information propagation from points temporally distant rather than close in time. Moreover, we prove that architectures that follow either time-and-space or time-then-space processing paradigms are equally affected by this phenomenon, providing theoretical justification for computationally efficient implementations. We validate our findings on synthetic and real-world datasets, providing deeper insights into their operational dynamics and principled guidance for more effective designs.
