Table of Contents
Fetching ...

Statistical physics analysis of graph neural networks: Approaching optimality in the contextual stochastic block model

O. Duranthon, L. Zdeborová

TL;DR

The paper tackles the problem of understanding how depth and architecture affect the generalization of graph neural networks on data generated by the contextual stochastic block model. It develops a replica-based, high-dimensional analysis that yields exact predictions for training and testing performance of a simple, linear GCN with $K$ aggregation steps and residuals, connecting to a continuous-depth limit that resembles a neural ODE on graphs. A key finding is that increasing depth to a point (and scaling residuals appropriately) can approach Bayes-optimal performance, especially when the graph is symmetrized; the continuous GCN with optimal diffusion time $t^*$ can outperform any finite-$K$ counterpart. The work also provides a dynamical mean-field theory interpretation, showing how order parameters play the role of correlation and response functions in a diffusion-on-graph process. Overall, the framework offers sharp, quantitative guidance on how depth, residuals, and regularization shape the generalization of deep GCNs and suggests a path to analyzing other deep architectures.

Abstract

Graph neural networks (GNNs) are designed to process data associated with graphs. They are finding an increasing range of applications; however, as with other modern machine learning techniques, their theoretical understanding is limited. GNNs can encounter difficulties in gathering information from nodes that are far apart by iterated aggregation steps. This situation is partly caused by so-called oversmoothing; and overcoming it is one of the practically motivated challenges. We consider the situation where information is aggregated by multiple steps of convolution, leading to graph convolutional networks (GCNs). We analyze the generalization performance of a basic GCN, trained for node classification on data generated by the contextual stochastic block model. We predict its asymptotic performance by deriving the free energy of the problem, using the replica method, in the high-dimensional limit. Calling depth the number of convolutional steps, we show the importance of going to large depth to approach the Bayes-optimality. We detail how the architecture of the GCN has to scale with the depth to avoid oversmoothing. The resulting large depth limit can be close to the Bayes-optimality and leads to a continuous GCN. Technically, we tackle this continuous limit via an approach that resembles dynamical mean-field theory (DMFT) with constraints at the initial and final times. An expansion around large regularization allows us to solve the corresponding equations for the performance of the deep GCN. This promising tool may contribute to the analysis of further deep neural networks.

Statistical physics analysis of graph neural networks: Approaching optimality in the contextual stochastic block model

TL;DR

The paper tackles the problem of understanding how depth and architecture affect the generalization of graph neural networks on data generated by the contextual stochastic block model. It develops a replica-based, high-dimensional analysis that yields exact predictions for training and testing performance of a simple, linear GCN with aggregation steps and residuals, connecting to a continuous-depth limit that resembles a neural ODE on graphs. A key finding is that increasing depth to a point (and scaling residuals appropriately) can approach Bayes-optimal performance, especially when the graph is symmetrized; the continuous GCN with optimal diffusion time can outperform any finite- counterpart. The work also provides a dynamical mean-field theory interpretation, showing how order parameters play the role of correlation and response functions in a diffusion-on-graph process. Overall, the framework offers sharp, quantitative guidance on how depth, residuals, and regularization shape the generalization of deep GCNs and suggests a path to analyzing other deep architectures.

Abstract

Graph neural networks (GNNs) are designed to process data associated with graphs. They are finding an increasing range of applications; however, as with other modern machine learning techniques, their theoretical understanding is limited. GNNs can encounter difficulties in gathering information from nodes that are far apart by iterated aggregation steps. This situation is partly caused by so-called oversmoothing; and overcoming it is one of the practically motivated challenges. We consider the situation where information is aggregated by multiple steps of convolution, leading to graph convolutional networks (GCNs). We analyze the generalization performance of a basic GCN, trained for node classification on data generated by the contextual stochastic block model. We predict its asymptotic performance by deriving the free energy of the problem, using the replica method, in the high-dimensional limit. Calling depth the number of convolutional steps, we show the importance of going to large depth to approach the Bayes-optimality. We detail how the architecture of the GCN has to scale with the depth to avoid oversmoothing. The resulting large depth limit can be close to the Bayes-optimality and leads to a continuous GCN. Technically, we tackle this continuous limit via an approach that resembles dynamical mean-field theory (DMFT) with constraints at the initial and final times. An expansion around large regularization allows us to solve the corresponding equations for the performance of the deep GCN. This promising tool may contribute to the analysis of further deep neural networks.

Paper Structure

This paper contains 45 sections, 144 equations, 15 figures, 1 table.

Figures (15)

  • Figure 1: Test accuracy of the graph neural network on data generated by the contextual stochastic block model vs the signal strength. We define the model and the network in section \ref{['sec:setup']}. The test accuracy is maximized over all the hyperparameters of the network. The Bayes-optimal performance is from dz23csbm. The line $K=1$ has been studied by shi2022statisticaldz24gcn; we improve it to $K>1$, $K=\infty$ and symmetrized graphs. All the curves are theoretical predictions we derive in this work.
  • Figure 2: Predicted test accuracy $\mathrm{Acc}_\mathrm{test}$ for different values of $K$. Top: for $\lambda=1.5$, $\mu=3$ and logistic loss; bottom: for $\lambda=1$, $\mu=2$ and quadratic loss; $\alpha=4$ and $\rho=0.1$. We take $c_k=c$ for all $k$. Inset:$\mathrm{Acc}_\mathrm{test}$ vs $c_1$ and $c_2$ at $K=2$ and at large $r$. Dots: numerical simulation of the GCN for $N=10^4$ and $d=30$, averaged over ten experiments.
  • Figure 3: Predicted misclassification error $1-\mathrm{Acc}_\mathrm{test}$ at large $\lambda$ for two strengths of the feature signal. $r=\infty$, $c=c^*$ is optimized by grid search and $\rho=0.1$. The dots are theoretical predictions given by numerically solving the self-consistent equations (\ref{['eq:pointFixeDébut']}-\ref{['eq:pointFixeFin']}) simplified in the limit $r\to\infty$. For the symmetrized graph the self-consistent equations are eqs. (\ref{['eq:pointFixeCont_Vqh']}-\ref{['eq:pointFixeCont_Qh']}) in the next part.
  • Figure 4: Predicted test accuracy $\mathrm{Acc}_\mathrm{test}$ vs $K$ for different scalings of $c$, at $r=\infty$. Top: for $\lambda=1.5$, $\mu=3$; bottom: for $\lambda=0.7$, $\mu=1$; $\alpha=4$, $\rho=0.1$. The predictions are given either by the explicit expression eqs. (\ref{['eq:accC0Rinf']}-\ref{['eq:qC0Rinf']}) for $c=0$, either by solving the self-consistent equations (\ref{['eq:pointFixeDébut']}-\ref{['eq:pointFixeFin']}) simplified in the limit $r\to\infty$. The performance for the continuous limit are derived and given in the next section \ref{['sec:continuousGCN']}, while the performance of PCA on the graph are given by eqs. (\ref{['eq:qPCA']}-\ref{['eq:accPCA']}).
  • Figure 5: Predicted test accuracy $\mathrm{Acc}_\mathrm{test}$ of the continuous GCN on the asymmetric graph, at $r=\infty$. $\alpha=4$ and $\rho=0.1$. The performance of the continuous GCN are given by eq. \ref{['eq:precisionContinue']}. Dots: numerical simulation of the continuous GCN for $N=10^4$ and $d=30$, trained with quadratic loss, averaged over ten experiments.
  • ...and 10 more figures