Analyzing the Effect of Embedding Norms and Singular Values to Oversmoothing in Graph Neural Networks
Dimitrios Kelesis, Dimitris Fotakis, Georgios Paliouras
TL;DR
The paper addresses oversmoothing in deep graph neural networks by introducing MASED, a Mean Average Squared Euclidean Distance that quantifies embedding dispersion across layers. It derives layer-wise and network-wide bounds linking MASED to embedding norms and the singular values of weight matrices, providing a principled explanation for depth-related collapse and suggesting targeted remedies. The authors propose G-Reg to increase the smallest singular value of weight matrices and advocate decoupling the number of trainable weight matrices from the total graph-hop count to reduce redundancy and oversmoothing. Empirical results across seven datasets show that MASED correlates with performance and that G-Reg, along with decoupled-hop architectures, enables deeper, more robust GNNs, including in cold-start scenarios. This work offers a concrete, quantitative framework for understanding and mitigating oversmoothing with practical guidelines for model design and regularization.
Abstract
In this paper, we study the factors that contribute to the effect of oversmoothing in deep Graph Neural Networks (GNNs). Specifically, our analysis is based on a new metric (Mean Average Squared Distance - $MASED$) to quantify the extent of oversmoothing. We derive layer-wise bounds on $MASED$, which aggregate to yield global upper and lower distance bounds. Based on this quantification of oversmoothing, we further analyze the importance of two different properties of the model; namely the norms of the generated node embeddings, along with the largest and smallest singular values of the weight matrices. Building on the insights drawn from the theoretical analysis, we show that oversmoothing increases as the number of trainable weight matrices and the number of adjacency matrices increases. We also use the derived layer-wise bounds on $MASED$ to form a proposal for decoupling the number of hops (i.e., adjacency depth) from the number of weight matrices. In particular, we introduce G-Reg, a regularization scheme that increases the bounds, and demonstrate through extensive experiments that by doing so node classification accuracy increases, achieving robustness at large depths. We further show that by reducing oversmoothing in deep networks, we can achieve better results in some tasks than using shallow ones. Specifically, we experiment with a ``cold start" scenario, i.e., when there is no feature information for the unlabeled nodes. Finally, we show empirically the trade-off between receptive field size (i.e., number of weight matrices) and performance, using the $MASED$ bounds. This is achieved by distributing adjacency hops across a small number of trainable layers, avoiding the extremes of under- or over-parameterization of the GNN.
