Statistical Guarantees for Link Prediction using Graph Neural Networks

Alan Chung; Amin Saberi; Morgane Austern

Statistical Guarantees for Link Prediction using Graph Neural Networks

Alan Chung, Amin Saberi, Morgane Austern

TL;DR

This work provides statistical guarantees for link prediction on graphs generated by graphons using a linear Graphon GNN (LG-GNN). The authors introduce a two-stage pipeline: LG-GNN embeddings yield estimators of graphon path-moment features, which are then combined via constrained regression to recover edge probabilities $W_{n,i,j}=\rho_n W(\omega_i,\omega_j)$. They prove consistency of moment estimators, derive finite-sample rates, and show that edge ranking can be preserved with faster convergence than full graphon estimation, including guarantees in stochastic block models. They also demonstrate limitations of naive GCN architectures with random initialization, emphasize the identifiability challenges of single-layer dot-product embeddings, and provide empirical results on real (Cora) and synthetic graph models, showing LG-GNN’s competitiveness and speed without heavy tuning. Overall, the paper advances understanding of when and how GNNs can provably recover underlying edge probabilities and maintain correct edge rankings under graphon-based graph generation, with practical implications for link prediction in sparse and dense regimes.

Abstract

This paper derives statistical guarantees for the performance of Graph Neural Networks (GNNs) in link prediction tasks on graphs generated by a graphon. We propose a linear GNN architecture (LG-GNN) that produces consistent estimators for the underlying edge probabilities. We establish a bound on the mean squared error and give guarantees on the ability of LG-GNN to detect high-probability edges. Our guarantees hold for both sparse and dense graphs. Finally, we demonstrate some of the shortcomings of the classical GCN architecture, as well as verify our results on real and synthetic datasets.

Statistical Guarantees for Link Prediction using Graph Neural Networks

TL;DR

. They prove consistency of moment estimators, derive finite-sample rates, and show that edge ranking can be preserved with faster convergence than full graphon estimation, including guarantees in stochastic block models. They also demonstrate limitations of naive GCN architectures with random initialization, emphasize the identifiability challenges of single-layer dot-product embeddings, and provide empirical results on real (Cora) and synthetic graph models, showing LG-GNN’s competitiveness and speed without heavy tuning. Overall, the paper advances understanding of when and how GNNs can provably recover underlying edge probabilities and maintain correct edge rankings under graphon-based graph generation, with practical implications for link prediction in sparse and dense regimes.

Abstract

Paper Structure (62 sections, 29 theorems, 237 equations, 1 figure, 16 tables, 4 algorithms)

This paper contains 62 sections, 29 theorems, 237 equations, 1 figure, 16 tables, 4 algorithms.

Introduction
Organization of the Paper
Related Works
Notation and Preliminaries
Assumptions
Graph Neural Networks
Link Prediction
Main Results
Statistical Guarantees for Moment Estimation
Edge Prediction Using the Moments of the Graphon
Preserving Ranking in Link Prediction
Performance of the Classical GCN Architecture
Identifiability and Relevance to Common Random Graph Models
Experimental Results
Real Data: Cora Dataset
...and 47 more sections

Key Result

Proposition 4.1

Suppose that the graph $G_n=([n],E_n)$ is generated according to a graphon $W_n=\rho_nW$. Suppose that assumptions (asp2) and (asp3) hold. Then, with probability at least $1 - 5/n - n \cdot \rm{exp}(-\delta_W \rho_n(n-1)/3)$, for all $2 \leq k \leq L+2,$ where $a_k =C (8(k+2))^k k^{k+1}\sqrt{k!}$ and $C$ is some absolute constant.

Figures (1)

Figure 1: Plot of the predicted probabilities by the PLS Regression (row 1), GCN without node features (row 2), and GCN with node features (row 3). The left column shows for 2 layers, the right column shows for 4 layers.

Theorems & Definitions (50)

Proposition 4.1
Proposition 4.2
Definition 4.3: MSE error
Theorem 4.4: Main Theorem
Proposition 4.5: Informal
Proposition 5.1
Proposition 5.2
Proposition 6.1
Example 6.2
Lemma 1.1
...and 40 more

Statistical Guarantees for Link Prediction using Graph Neural Networks

TL;DR

Abstract

Statistical Guarantees for Link Prediction using Graph Neural Networks

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (1)

Theorems & Definitions (50)