Table of Contents
Fetching ...

Adaptive Initial Residual Connections for GNNs with Theoretical Guarantees

Mohammad Shirzadi, Ali Safarpoor Dehkordi, Ahad N. Zehmakan

TL;DR

This work tackles oversmoothing in deep graph neural networks by introducing Adaptive Initial Residual Connection (AIRC) with per-node residual strengths. The authors prove theoretically that AIRC preserves embedding rank and keeps the Dirichlet energy $\mathcal{E}$ bounded away from zero under activations, and they extend static IRC results to nonlinear settings; they also show a closed-form limiting behavior under suitable spectral conditions. To enhance practicality, they propose a PageRank-based heuristic that assigns residual strengths without learning, achieving comparable performance with reduced complexity. Empirically, AIRC outperforms standard PMs and state-of-the-art GNNs, especially on heterophilic graphs, and remains robust as network depth increases. The combination of theoretical guarantees and scalable, empirically strong results makes AIRC a compelling approach for deep, adaptive graph representations.

Abstract

Message passing is the core operation in graph neural networks, where each node updates its embeddings by aggregating information from its neighbors. However, in deep architectures, this process often leads to diminished expressiveness. A popular solution is to use residual connections, where the input from the current (or initial) layer is added to aggregated neighbor information to preserve embeddings across layers. Following a recent line of research, we investigate an adaptive residual scheme in which different nodes have varying residual strengths. We prove that this approach prevents oversmoothing; particularly, we show that the Dirichlet energy of the embeddings remains bounded away from zero. This is the first theoretical guarantee not only for the adaptive setting, but also for static residual connections (where residual strengths are shared across nodes) with activation functions. Furthermore, extensive experiments show that this adaptive approach outperforms standard and state-of-the-art message passing mechanisms, especially on heterophilic graphs. To improve the time complexity of our approach, we introduce a variant in which residual strengths are not learned but instead set heuristically, a choice that performs as well as the learnable version.

Adaptive Initial Residual Connections for GNNs with Theoretical Guarantees

TL;DR

This work tackles oversmoothing in deep graph neural networks by introducing Adaptive Initial Residual Connection (AIRC) with per-node residual strengths. The authors prove theoretically that AIRC preserves embedding rank and keeps the Dirichlet energy bounded away from zero under activations, and they extend static IRC results to nonlinear settings; they also show a closed-form limiting behavior under suitable spectral conditions. To enhance practicality, they propose a PageRank-based heuristic that assigns residual strengths without learning, achieving comparable performance with reduced complexity. Empirically, AIRC outperforms standard PMs and state-of-the-art GNNs, especially on heterophilic graphs, and remains robust as network depth increases. The combination of theoretical guarantees and scalable, empirically strong results makes AIRC a compelling approach for deep, adaptive graph representations.

Abstract

Message passing is the core operation in graph neural networks, where each node updates its embeddings by aggregating information from its neighbors. However, in deep architectures, this process often leads to diminished expressiveness. A popular solution is to use residual connections, where the input from the current (or initial) layer is added to aggregated neighbor information to preserve embeddings across layers. Following a recent line of research, we investigate an adaptive residual scheme in which different nodes have varying residual strengths. We prove that this approach prevents oversmoothing; particularly, we show that the Dirichlet energy of the embeddings remains bounded away from zero. This is the first theoretical guarantee not only for the adaptive setting, but also for static residual connections (where residual strengths are shared across nodes) with activation functions. Furthermore, extensive experiments show that this adaptive approach outperforms standard and state-of-the-art message passing mechanisms, especially on heterophilic graphs. To improve the time complexity of our approach, we introduce a variant in which residual strengths are not learned but instead set heuristically, a choice that performs as well as the learnable version.

Paper Structure

This paper contains 24 sections, 6 theorems, 46 equations, 4 figures, 3 tables.

Key Result

Theorem 1

Considering the simplified version of the message passing GRC given by fj_normalized_simplified, where we have no activation function, no linear transformation, the system stabilizes and it maintains full rank embeddings for all $\ell \in \mathbb{N}$. More precisely, the limiting behavior is governe and the embedding space never collapses, as

Figures (4)

  • Figure 1: Embedding evolution in GCN vs adaptive IRC. GCN leads to embedding collapse, while adaptive IRC preserves distinct clusters.
  • Figure 2: Feature evolution in GAT, GraphSAGE (mean and max), and adaptive IRC. GAT and GraphSAGE suffer from Embedding collapse due to oversmoothing, while adaptive IRC preserves distinct clusters.
  • Figure 3: Dirichlet energy (log scale) for output of GCN, GAT, GraphSAGE (mean and max), and learnable adaptive IRC with varying numbers of layers.
  • Figure 4: Performance across depths. Adaptive IRC (learnable and PageRank-based) remains accurate with increasing layers and outperforms other methods, except Actor, in shallow settings.

Theorems & Definitions (14)

  • Theorem 1
  • proof
  • Definition 1: Dirichlet Energy
  • Lemma 1
  • proof
  • Remark 1
  • Lemma 2
  • proof
  • Corollary 1
  • proof
  • ...and 4 more