How Wide and How Deep? Mitigating Over-Squashing of GNNs via Channel Capacity Constrained Estimation
Zinuo You, Jin Zheng, John Cartlidge
TL;DR
The paper addresses over-squashing in GNNs by reframing spectral GNNs as communication channels and deriving an information-theoretic framework, C$^3$E, to estimate optimal hidden dimensions and propagation depth before training. It leverages maximum entropy to bound the information that can be propagated and defines an architecture-aware channel capacity $\phi_0$ to account for layerwise bottlenecks, formulating a constrained nonlinear program that balances capacity against information compression. The authors provide closed-form expressions for the theoretical channel capacity $\phi$ and the effective capacity $\phi_0$, as well as the representation compression ratio $\theta$, and demonstrate through experiments on nine public datasets that C$^3$E-selected architectures consistently mitigate over-squashing and improve representation learning, with solutions obtainable in seconds. The key contributions include an information-theoretic view of information flow in spectral GNNs, a principled method to select architecture via nonlinear optimization, and empirical evidence that optimal widths and depths stabilize information propagation and enhance performance. This approach offers a principled alternative to heuristic architectural choices and provides a foundation for extending to spatial GNNs and more complex propagators in the future.
Abstract
Existing graph neural networks typically rely on heuristic choices for hidden dimensions and propagation depths, which often lead to severe information loss during propagation, known as over-squashing. To address this issue, we propose Channel Capacity Constrained Estimation (C3E), a novel framework that formulates the selection of hidden dimensions and depth as a nonlinear programming problem grounded in information theory. Through modeling spectral graph neural networks as communication channels, our approach directly connects channel capacity to hidden dimensions, propagation depth, propagation mechanism, and graph structure. Extensive experiments on nine public datasets demonstrate that hidden dimensions and depths estimated by C3E can mitigate over-squashing and consistently improve representation learning. Experimental results show that over-squashing occurs due to the cumulative compression of information in representation matrices. Furthermore, our findings show that increasing hidden dimensions indeed mitigate information compression, while the role of propagation depth is more nuanced, uncovering a fundamental balance between information compression and representation complexity.
