Residual connections provably mitigate oversmoothing in graph neural networks

Ziang Chen; Zhengjiang Lin; Shi Chen; Yury Polyanskiy; Philippe Rigollet

Residual connections provably mitigate oversmoothing in graph neural networks

Ziang Chen, Zhengjiang Lin, Shi Chen, Yury Polyanskiy, Philippe Rigollet

TL;DR

The paper tackles oversmoothing in deep graph neural networks by introducing a rigorous, MET-based framework to study the asymptotic separation of vertex features. It defines a normalized vertex similarity measure $\mu(x)$ and derives exact rates for non-residual and residual GNNs under broad spectral and distributional assumptions, including non-symmetric $P$ and i.i.d. weight ensembles. The main contributions are two theorems: (i) non-residual GNNs exhibit exponential decay of $\mu(x^{(t)})$ at a rate given by the second-largest eigenvalue of $P$, and (ii) residual GNNs admit a computable lower bound on the same rate, often strictly larger or even equal to 1, indicating mitigation or avoidance of oversmoothing; special cases (deterministic, Ginibre, bounded-norm, simultaneously diagonalizable) are treated explicitly. The findings are validated with numerical experiments on standard citation graphs, showing that residual connections preserve vertex distinctiveness and improve deep-model performance, thereby offering practical guidance for designing deeper GNNs with provable resilience to oversmoothing.

Abstract

Graph neural networks (GNNs) have achieved remarkable empirical success in processing and representing graph-structured data across various domains. However, a significant challenge known as "oversmoothing" persists, where vertex features become nearly indistinguishable in deep GNNs, severely restricting their expressive power and practical utility. In this work, we analyze the asymptotic oversmoothing rates of deep GNNs with and without residual connections by deriving explicit convergence rates for a normalized vertex similarity measure. Our analytical framework is grounded in the multiplicative ergodic theorem. Furthermore, we demonstrate that adding residual connections effectively mitigates or prevents oversmoothing across several broad families of parameter distributions. The theoretical findings are strongly supported by numerical experiments.

Residual connections provably mitigate oversmoothing in graph neural networks

TL;DR

and derives exact rates for non-residual and residual GNNs under broad spectral and distributional assumptions, including non-symmetric

and i.i.d. weight ensembles. The main contributions are two theorems: (i) non-residual GNNs exhibit exponential decay of

at a rate given by the second-largest eigenvalue of

, and (ii) residual GNNs admit a computable lower bound on the same rate, often strictly larger or even equal to 1, indicating mitigation or avoidance of oversmoothing; special cases (deterministic, Ginibre, bounded-norm, simultaneously diagonalizable) are treated explicitly. The findings are validated with numerical experiments on standard citation graphs, showing that residual connections preserve vertex distinctiveness and improve deep-model performance, thereby offering practical guidance for designing deeper GNNs with provable resilience to oversmoothing.

Abstract

Paper Structure (30 sections, 21 theorems, 114 equations, 4 figures, 1 table)

This paper contains 30 sections, 21 theorems, 114 equations, 4 figures, 1 table.

Introduction
Related work
Our contribution
Organization
Main Results
Asymptotic oversmoothing rate
Weight matrices
Deterministic
Ginibre ensemble
Bounded norm
Simultaneously diagonalizable
Proofs
Preliminary on linear random dynamical system
Preliminary on tensor product space
Proof of \ref{['thm:nonres_GNN']}
...and 15 more sections

Key Result

Proposition 2.3

Suppose that asp:W holds. The followings are true.

Figures (4)

Figure 1: Vertex similarity measure of initialized GCNs and residual GCNs on the largest connected component
Figure 2: Vertex similarity measure of trained GCNs and residual GCNs on the largest connected component
Figure 3: Training loss and training classification accuracy of GCNs and residual GCNs
Figure 4: Validation and test classification accuracy of GCNs and residual GCNs

Theorems & Definitions (42)

Definition 2.1: Vertex similarity measure
Remark 2.2
Proposition 2.3
Theorem 2.4: Asymptotic oversmoothing rate of deep non-residual GNNs
Theorem 2.5: Asymptotic oversmoothing rate of deep residual GNNs
Theorem 2.6
Theorem 2.7
Theorem 2.8
Theorem 2.9
Theorem 2.10
...and 32 more

Residual connections provably mitigate oversmoothing in graph neural networks

TL;DR

Abstract

Residual connections provably mitigate oversmoothing in graph neural networks

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (4)

Theorems & Definitions (42)