Adaptive Initial Residual Connections for GNNs with Theoretical Guarantees
Mohammad Shirzadi, Ali Safarpoor Dehkordi, Ahad N. Zehmakan
TL;DR
This work tackles oversmoothing in deep graph neural networks by introducing Adaptive Initial Residual Connection (AIRC) with per-node residual strengths. The authors prove theoretically that AIRC preserves embedding rank and keeps the Dirichlet energy $\mathcal{E}$ bounded away from zero under activations, and they extend static IRC results to nonlinear settings; they also show a closed-form limiting behavior under suitable spectral conditions. To enhance practicality, they propose a PageRank-based heuristic that assigns residual strengths without learning, achieving comparable performance with reduced complexity. Empirically, AIRC outperforms standard PMs and state-of-the-art GNNs, especially on heterophilic graphs, and remains robust as network depth increases. The combination of theoretical guarantees and scalable, empirically strong results makes AIRC a compelling approach for deep, adaptive graph representations.
Abstract
Message passing is the core operation in graph neural networks, where each node updates its embeddings by aggregating information from its neighbors. However, in deep architectures, this process often leads to diminished expressiveness. A popular solution is to use residual connections, where the input from the current (or initial) layer is added to aggregated neighbor information to preserve embeddings across layers. Following a recent line of research, we investigate an adaptive residual scheme in which different nodes have varying residual strengths. We prove that this approach prevents oversmoothing; particularly, we show that the Dirichlet energy of the embeddings remains bounded away from zero. This is the first theoretical guarantee not only for the adaptive setting, but also for static residual connections (where residual strengths are shared across nodes) with activation functions. Furthermore, extensive experiments show that this adaptive approach outperforms standard and state-of-the-art message passing mechanisms, especially on heterophilic graphs. To improve the time complexity of our approach, we introduce a variant in which residual strengths are not learned but instead set heuristically, a choice that performs as well as the learnable version.
