Table of Contents
Fetching ...

Residual connections provably mitigate oversmoothing in graph neural networks

Ziang Chen, Zhengjiang Lin, Shi Chen, Yury Polyanskiy, Philippe Rigollet

TL;DR

The paper tackles oversmoothing in deep graph neural networks by introducing a rigorous, MET-based framework to study the asymptotic separation of vertex features. It defines a normalized vertex similarity measure $\mu(x)$ and derives exact rates for non-residual and residual GNNs under broad spectral and distributional assumptions, including non-symmetric $P$ and i.i.d. weight ensembles. The main contributions are two theorems: (i) non-residual GNNs exhibit exponential decay of $\mu(x^{(t)})$ at a rate given by the second-largest eigenvalue of $P$, and (ii) residual GNNs admit a computable lower bound on the same rate, often strictly larger or even equal to 1, indicating mitigation or avoidance of oversmoothing; special cases (deterministic, Ginibre, bounded-norm, simultaneously diagonalizable) are treated explicitly. The findings are validated with numerical experiments on standard citation graphs, showing that residual connections preserve vertex distinctiveness and improve deep-model performance, thereby offering practical guidance for designing deeper GNNs with provable resilience to oversmoothing.

Abstract

Graph neural networks (GNNs) have achieved remarkable empirical success in processing and representing graph-structured data across various domains. However, a significant challenge known as "oversmoothing" persists, where vertex features become nearly indistinguishable in deep GNNs, severely restricting their expressive power and practical utility. In this work, we analyze the asymptotic oversmoothing rates of deep GNNs with and without residual connections by deriving explicit convergence rates for a normalized vertex similarity measure. Our analytical framework is grounded in the multiplicative ergodic theorem. Furthermore, we demonstrate that adding residual connections effectively mitigates or prevents oversmoothing across several broad families of parameter distributions. The theoretical findings are strongly supported by numerical experiments.

Residual connections provably mitigate oversmoothing in graph neural networks

TL;DR

The paper tackles oversmoothing in deep graph neural networks by introducing a rigorous, MET-based framework to study the asymptotic separation of vertex features. It defines a normalized vertex similarity measure and derives exact rates for non-residual and residual GNNs under broad spectral and distributional assumptions, including non-symmetric and i.i.d. weight ensembles. The main contributions are two theorems: (i) non-residual GNNs exhibit exponential decay of at a rate given by the second-largest eigenvalue of , and (ii) residual GNNs admit a computable lower bound on the same rate, often strictly larger or even equal to 1, indicating mitigation or avoidance of oversmoothing; special cases (deterministic, Ginibre, bounded-norm, simultaneously diagonalizable) are treated explicitly. The findings are validated with numerical experiments on standard citation graphs, showing that residual connections preserve vertex distinctiveness and improve deep-model performance, thereby offering practical guidance for designing deeper GNNs with provable resilience to oversmoothing.

Abstract

Graph neural networks (GNNs) have achieved remarkable empirical success in processing and representing graph-structured data across various domains. However, a significant challenge known as "oversmoothing" persists, where vertex features become nearly indistinguishable in deep GNNs, severely restricting their expressive power and practical utility. In this work, we analyze the asymptotic oversmoothing rates of deep GNNs with and without residual connections by deriving explicit convergence rates for a normalized vertex similarity measure. Our analytical framework is grounded in the multiplicative ergodic theorem. Furthermore, we demonstrate that adding residual connections effectively mitigates or prevents oversmoothing across several broad families of parameter distributions. The theoretical findings are strongly supported by numerical experiments.
Paper Structure (30 sections, 21 theorems, 114 equations, 4 figures, 1 table)

This paper contains 30 sections, 21 theorems, 114 equations, 4 figures, 1 table.

Key Result

Proposition 2.3

Suppose that asp:W holds. The followings are true.

Figures (4)

  • Figure 1: Vertex similarity measure of initialized GCNs and residual GCNs on the largest connected component
  • Figure 2: Vertex similarity measure of trained GCNs and residual GCNs on the largest connected component
  • Figure 3: Training loss and training classification accuracy of GCNs and residual GCNs
  • Figure 4: Validation and test classification accuracy of GCNs and residual GCNs

Theorems & Definitions (42)

  • Definition 2.1: Vertex similarity measure
  • Remark 2.2
  • Proposition 2.3
  • Theorem 2.4: Asymptotic oversmoothing rate of deep non-residual GNNs
  • Theorem 2.5: Asymptotic oversmoothing rate of deep residual GNNs
  • Theorem 2.6
  • Theorem 2.7
  • Theorem 2.8
  • Theorem 2.9
  • Theorem 2.10
  • ...and 32 more