Analysis of Corrected Graph Convolutions
Robert Wang, Aseem Baranwal, Kimon Fountoulakis
TL;DR
This work addresses oversmoothing in graph convolutional networks by introducing corrected adjacency operators that remove the principal eigenvector, shifting the limiting behavior from the top to the second eigenvector. Using a spectral analysis within the contextual stochastic block model, it proves that each corrected convolution round reduces misclassification error exponentially up to a saturation point for binary classification, and exponentially improves the exact separability threshold to $O\left(\frac{\log n}{\log\log n}\right)$ under suitable density and signal strength. The results extend from two-class CSBM to multi-class Gaussian mixtures, showing that a softmax on contracted features yields accurate classification when centers are sufficiently separated relative to graph noise. The paper also provides concentration bounds, reductions to 1-D analysis, and empirical validation on synthetic and real graphs, illustrating practical benefits for mitigating oversmoothing in GNNs. Overall, it offers rigorous, scalable guarantees for using corrected graph convolutions to improve node classification performance in graphs with realistic block-structured signals.
Abstract
Machine learning for node classification on graphs is a prominent area driven by applications such as recommendation systems. State-of-the-art models often use multiple graph convolutions on the data, as empirical evidence suggests they can enhance performance. However, it has been shown empirically and theoretically, that too many graph convolutions can degrade performance significantly, a phenomenon known as oversmoothing. In this paper, we provide a rigorous theoretical analysis, based on the two-class contextual stochastic block model (CSBM), of the performance of vanilla graph convolution from which we remove the principal eigenvector to avoid oversmoothing. We perform a spectral analysis for $k$ rounds of corrected graph convolutions, and we provide results for partial and exact classification. For partial classification, we show that each round of convolution can reduce the misclassification error exponentially up to a saturation level, after which performance does not worsen. We also extend this analysis to the multi-class setting with features distributed according to a Gaussian mixture model. For exact classification, we show that the separability threshold can be improved exponentially up to $O({\log{n}}/{\log\log{n}})$ corrected convolutions.
