Table of Contents
Fetching ...

Exact Generalisation Error Exposes Benchmarks Skew Graph Neural Networks Success (or Failure)

Nil Ayday, Mahalakshmi Sabanayagam, Debarghya Ghoshdastidar

Abstract

Graph Neural Networks (GNNs) have become the standard method for learning from networks across fields ranging from biology to social systems, yet a principled understanding of what enables them to extract meaningful representations, or why performance varies drastically between similar models, remains elusive. These questions can be answered through the generalisation error, which measures the discrepancy between a model's predictions and the true values it is meant to recover. Although several works have derived generalisation error bounds, learning theoretical bounds are typically loose, restricted to a single architecture, and offer limited insight into what governs generalisation in practice. In this work, we take a fundamentally different approach by deriving the exact generalisation error for a broad range of linear GNNs, including convolutional, PageRank-based, and attention-based models, through the lens of signal processing. Our exact generalisation error exposes a strong benchmark bias in existing literature: commonly used datasets exhibit high alignment between node features and the graph structure, inherently favouring architectures that rely on it. We further show that the similarity between connected nodes (homophily) decisively governs which architectures are best suited for a given graph, thereby explaining how specific benchmark properties systematically shape the reported performance in the literature. Together, these results explain when and why GNNs can effectively leverage structure and feature information, supporting the reliable application of GNNs.

Exact Generalisation Error Exposes Benchmarks Skew Graph Neural Networks Success (or Failure)

Abstract

Graph Neural Networks (GNNs) have become the standard method for learning from networks across fields ranging from biology to social systems, yet a principled understanding of what enables them to extract meaningful representations, or why performance varies drastically between similar models, remains elusive. These questions can be answered through the generalisation error, which measures the discrepancy between a model's predictions and the true values it is meant to recover. Although several works have derived generalisation error bounds, learning theoretical bounds are typically loose, restricted to a single architecture, and offer limited insight into what governs generalisation in practice. In this work, we take a fundamentally different approach by deriving the exact generalisation error for a broad range of linear GNNs, including convolutional, PageRank-based, and attention-based models, through the lens of signal processing. Our exact generalisation error exposes a strong benchmark bias in existing literature: commonly used datasets exhibit high alignment between node features and the graph structure, inherently favouring architectures that rely on it. We further show that the similarity between connected nodes (homophily) decisively governs which architectures are best suited for a given graph, thereby explaining how specific benchmark properties systematically shape the reported performance in the literature. Together, these results explain when and why GNNs can effectively leverage structure and feature information, supporting the reliable application of GNNs.

Paper Structure

This paper contains 35 sections, 7 theorems, 67 equations, 10 figures, 2 tables.

Key Result

Theorem 3.3

Under the framework in Section Ch:pre and Assumptions As:general and As:aniso_prior, the generalisation error $R_{H}$, as defined in Eq. def:gen_err, is given by: where $\tilde{\lambda}_{i}=g(\lambda_i)^{2\ell} (\Lambda_f)_i$ (Eq. Eq:H), and $\lambda^{\star}_{i}=(\Lambda^\star)_i$ (Eq. eq:XZ), with $g(\Lambda)$, $\Lambda_f$, and $\Lambda^\star$ governing the influence of the graph filter, the obs

Figures (10)

  • Figure 1: GCN performance on two common benchmark (Cora 10.5555/295240.295725 and Squirrel DBLP:journals/compnet/RozemberczkiAS21). Experiments using synthetic variants of the benchmarks demonstrate that homophily alone does not explain performance and benchmarks do not fully capture GNN behavior. $h$ = homophily score ($h=1.0$ indicates perfect homophily); Acc =test accuracy.
  • Figure 2: Misalignment vs. Accuracy. Test accuracy of convolution (${S}{Z}$) and concatenation ($[S \ Z]$) models. The x-axis shows the misalignment score as defined in Definition \ref{['def:Misalignment']} for $H = SZ$.
  • Figure 3: Frequency response and theoretical error of Chebyshev and PageRank-based GNNs.
  • Figure 4: Spectral Symmetry and Theoretical Error. Left: $\lambda$ of Cora and Squirrel, showing approximate spectral symmetry. Right: $R_{\text{GCN}}$, averaged over the number of $\lambda$, as a function of $q$ with $c=0.01$.
  • Figure 5: Effect of Laplacian Spectrum on GCN Accuracy under Varying Homophily. GCN test accuracy (left) and the $\lambda_{\max}$ (right).
  • ...and 5 more figures

Theorems & Definitions (12)

  • Theorem 3.3: Generalisation Error of GNNs
  • Corollary 4.1: Generalisation Error with Isotropic Parameter Prior
  • Definition 4.2: Misalignment of GNN
  • Lemma 4.3: Misalignment as a Component of Generalisation Error
  • Corollary 4.4: Derivative of Generalisation Error with Respect to Homophily
  • Remark 4.5
  • Definition 4.6: Spectral View of Graph Attention Networks and Specformer
  • Corollary 4.7: Generalisation Error between GAT and Specformer
  • proof
  • Lemma D.1: Concentration of empirical covariance koltchinskii2017concentrationDBLP:journals/corr/abs-1906-11300
  • ...and 2 more