Table of Contents
Fetching ...

How Wide and How Deep? Mitigating Over-Squashing of GNNs via Channel Capacity Constrained Estimation

Zinuo You, Jin Zheng, John Cartlidge

TL;DR

The paper addresses over-squashing in GNNs by reframing spectral GNNs as communication channels and deriving an information-theoretic framework, C$^3$E, to estimate optimal hidden dimensions and propagation depth before training. It leverages maximum entropy to bound the information that can be propagated and defines an architecture-aware channel capacity $\phi_0$ to account for layerwise bottlenecks, formulating a constrained nonlinear program that balances capacity against information compression. The authors provide closed-form expressions for the theoretical channel capacity $\phi$ and the effective capacity $\phi_0$, as well as the representation compression ratio $\theta$, and demonstrate through experiments on nine public datasets that C$^3$E-selected architectures consistently mitigate over-squashing and improve representation learning, with solutions obtainable in seconds. The key contributions include an information-theoretic view of information flow in spectral GNNs, a principled method to select architecture via nonlinear optimization, and empirical evidence that optimal widths and depths stabilize information propagation and enhance performance. This approach offers a principled alternative to heuristic architectural choices and provides a foundation for extending to spatial GNNs and more complex propagators in the future.

Abstract

Existing graph neural networks typically rely on heuristic choices for hidden dimensions and propagation depths, which often lead to severe information loss during propagation, known as over-squashing. To address this issue, we propose Channel Capacity Constrained Estimation (C3E), a novel framework that formulates the selection of hidden dimensions and depth as a nonlinear programming problem grounded in information theory. Through modeling spectral graph neural networks as communication channels, our approach directly connects channel capacity to hidden dimensions, propagation depth, propagation mechanism, and graph structure. Extensive experiments on nine public datasets demonstrate that hidden dimensions and depths estimated by C3E can mitigate over-squashing and consistently improve representation learning. Experimental results show that over-squashing occurs due to the cumulative compression of information in representation matrices. Furthermore, our findings show that increasing hidden dimensions indeed mitigate information compression, while the role of propagation depth is more nuanced, uncovering a fundamental balance between information compression and representation complexity.

How Wide and How Deep? Mitigating Over-Squashing of GNNs via Channel Capacity Constrained Estimation

TL;DR

The paper addresses over-squashing in GNNs by reframing spectral GNNs as communication channels and deriving an information-theoretic framework, CE, to estimate optimal hidden dimensions and propagation depth before training. It leverages maximum entropy to bound the information that can be propagated and defines an architecture-aware channel capacity to account for layerwise bottlenecks, formulating a constrained nonlinear program that balances capacity against information compression. The authors provide closed-form expressions for the theoretical channel capacity and the effective capacity , as well as the representation compression ratio , and demonstrate through experiments on nine public datasets that CE-selected architectures consistently mitigate over-squashing and improve representation learning, with solutions obtainable in seconds. The key contributions include an information-theoretic view of information flow in spectral GNNs, a principled method to select architecture via nonlinear optimization, and empirical evidence that optimal widths and depths stabilize information propagation and enhance performance. This approach offers a principled alternative to heuristic architectural choices and provides a foundation for extending to spatial GNNs and more complex propagators in the future.

Abstract

Existing graph neural networks typically rely on heuristic choices for hidden dimensions and propagation depths, which often lead to severe information loss during propagation, known as over-squashing. To address this issue, we propose Channel Capacity Constrained Estimation (C3E), a novel framework that formulates the selection of hidden dimensions and depth as a nonlinear programming problem grounded in information theory. Through modeling spectral graph neural networks as communication channels, our approach directly connects channel capacity to hidden dimensions, propagation depth, propagation mechanism, and graph structure. Extensive experiments on nine public datasets demonstrate that hidden dimensions and depths estimated by C3E can mitigate over-squashing and consistently improve representation learning. Experimental results show that over-squashing occurs due to the cumulative compression of information in representation matrices. Furthermore, our findings show that increasing hidden dimensions indeed mitigate information compression, while the role of propagation depth is more nuanced, uncovering a fundamental balance between information compression and representation complexity.

Paper Structure

This paper contains 52 sections, 5 theorems, 80 equations, 11 figures, 4 tables.

Key Result

Theorem 1

The channel capacity of a spectral GNN is defined by maximizing the entropy of the encoded representation $\mathbf{H}_L$, which is expressed as,

Figures (11)

  • Figure 1: The green-axis (left) denotes performance, and blue-axis (right) denotes parameters counts (in millions). C$^3$E-estimated solutions (red points, starred for optimal) consistently land in high-performance regions within dashed intervals defined in Eq. (\ref{['lbcons']}), outperforming baselines with heuristic dimensions (green points, e.g., 16, 32, 64, 128, 256, 512, 1024).
  • Figure 2: C$^3$E-estimated models (red lines) avoid information loss by maintaining high $H(\mathbf{H}_l)$ across layers. In contrast, naively stacked baselines (green lines) suffer from over-squashing, with entropy collapsing to near-zero long before the final layer. Dotted lines indicate the maximum entropy for initial feature dimensions (brown) and fixed hidden dimensions (blue).
  • Figure 3: Average performance of C$^3$E-estimated models versus $\eta$ across nine benchmark datasets. Each curve shows the averaged C$^3$E-estimated model performance per dataset.
  • Figure 4: Examples of C$^3$E solutions (hidden dimensions) on five benchmark datasets.
  • Figure 5: The relationship between propagation depth ($L$), representation compression ($\theta$), and accuracy on the Cora dataset. (Left panels) For standard baseline models, increasing depth leads to a monotonic rise in the compression ratio $\theta$ and a consistent drop in accuracy. (Right panels) For C$^3$E-estimated models, accuracy improves with depth to an optimal point before declining, even as $\theta$ continues to rise. This highlights C$^3$E's ability to find and operate within a beneficial compression regime.
  • ...and 6 more figures

Theorems & Definitions (5)

  • Theorem 1
  • Corollary 2
  • Lemma A.1
  • Lemma A.2
  • Lemma B.1