Statistical Guarantees for Link Prediction using Graph Neural Networks
Alan Chung, Amin Saberi, Morgane Austern
TL;DR
This work provides statistical guarantees for link prediction on graphs generated by graphons using a linear Graphon GNN (LG-GNN). The authors introduce a two-stage pipeline: LG-GNN embeddings yield estimators of graphon path-moment features, which are then combined via constrained regression to recover edge probabilities $W_{n,i,j}=\rho_n W(\omega_i,\omega_j)$. They prove consistency of moment estimators, derive finite-sample rates, and show that edge ranking can be preserved with faster convergence than full graphon estimation, including guarantees in stochastic block models. They also demonstrate limitations of naive GCN architectures with random initialization, emphasize the identifiability challenges of single-layer dot-product embeddings, and provide empirical results on real (Cora) and synthetic graph models, showing LG-GNN’s competitiveness and speed without heavy tuning. Overall, the paper advances understanding of when and how GNNs can provably recover underlying edge probabilities and maintain correct edge rankings under graphon-based graph generation, with practical implications for link prediction in sparse and dense regimes.
Abstract
This paper derives statistical guarantees for the performance of Graph Neural Networks (GNNs) in link prediction tasks on graphs generated by a graphon. We propose a linear GNN architecture (LG-GNN) that produces consistent estimators for the underlying edge probabilities. We establish a bound on the mean squared error and give guarantees on the ability of LG-GNN to detect high-probability edges. Our guarantees hold for both sparse and dense graphs. Finally, we demonstrate some of the shortcomings of the classical GCN architecture, as well as verify our results on real and synthetic datasets.
