A Note on Over-Smoothing for Graph Neural Networks
Chen Cai, Yusu Wang
TL;DR
Over-smoothing limits the depth of GNNs. The authors use Dirichlet energy of node embeddings relative to the augmented normalized Laplacian to quantify expressiveness across layers. They prove an exponential decay bound, $E(\mathbf{f}_l(X)) \leq s_l \bar{\lambda} E(X)$ with $\bar{\lambda}=(1-\lambda)^2$, implying $E(X^{(L)})=O((s\bar{\lambda})^L)$ when $s\bar{\lambda}<1$, and extend the analysis to nonlinear activations like Leaky ReLU. Empirically, edge dropping and extreme edge weighting counteract over-smoothing and alter the Dirichlet energy, with implications for architecture design and training.
Abstract
Graph Neural Networks (GNNs) have achieved a lot of success on graph-structured data. However, it is observed that the performance of graph neural networks does not improve as the number of layers increases. This effect, known as over-smoothing, has been analyzed mostly in linear cases. In this paper, we build upon previous results \cite{oono2019graph} to further analyze the over-smoothing effect in the general graph neural network architecture. We show when the weight matrix satisfies the conditions determined by the spectrum of augmented normalized Laplacian, the Dirichlet energy of embeddings will converge to zero, resulting in the loss of discriminative power. Using Dirichlet energy to measure "expressiveness" of embedding is conceptually clean; it leads to simpler proofs than \cite{oono2019graph} and can handle more non-linearities.
