Graph Neural Networks Do Not Always Oversmooth
Bastian Epping, Alexandre René, Moritz Helias, Michael T. Schaub
TL;DR
This paper identifies a non-oversmoothing phase for Graph Convolutional Networks by leveraging Gaussian-process (GP) equivalence in the limit of infinite hidden features. By linearizing the GP dynamics around the fixed point and performing an eigen-direction analysis, it shows that deep GCNs can avoid oversmoothing if the weight variance $oldsymbol{ u_w^2}$ is large enough, with a transition at $ ext{max}_i|oldsymbol{ extlambda_i^{(p)}}|=1$ that yields a diverging propagation depth $oldsymbol{ extxi_i}$. The authors verify predictions on toy complete graphs and a Contextual Stochastic Block Model (CSBM), and demonstrate near-transition and chaotic-phase networks remain informative for deep depths, including on the Cora dataset where GP-based results match established benchmarks for hundreds of layers. The findings offer a principled initialization strategy to build exceptionally deep GCNs and provide insights into how graph topology is encoded in the equilibrium GP state, potentially guiding future GNN design and training.
Abstract
Graph neural networks (GNNs) have emerged as powerful tools for processing relational data in applications. However, GNNs suffer from the problem of oversmoothing, the property that the features of all nodes exponentially converge to the same vector over layers, prohibiting the design of deep GNNs. In this work we study oversmoothing in graph convolutional networks (GCNs) by using their Gaussian process (GP) equivalence in the limit of infinitely many hidden features. By generalizing methods from conventional deep neural networks (DNNs), we can describe the distribution of features at the output layer of deep GCNs in terms of a GP: as expected, we find that typical parameter choices from the literature lead to oversmoothing. The theory, however, allows us to identify a new, non-oversmoothing phase: if the initial weights of the network have sufficiently large variance, GCNs do not oversmooth, and node features remain informative even at large depth. We demonstrate the validity of this prediction in finite-size GCNs by training a linear classifier on their output. Moreover, using the linearization of the GCN GP, we generalize the concept of propagation depth of information from DNNs to GCNs. This propagation depth diverges at the transition between the oversmoothing and non-oversmoothing phase. We test the predictions of our approach and find good agreement with finite-size GCNs. Initializing GCNs near the transition to the non-oversmoothing phase, we obtain networks which are both deep and expressive.
