Table of Contents
Fetching ...

Statistical Guarantees for Link Prediction using Graph Neural Networks

Alan Chung, Amin Saberi, Morgane Austern

TL;DR

This work provides statistical guarantees for link prediction on graphs generated by graphons using a linear Graphon GNN (LG-GNN). The authors introduce a two-stage pipeline: LG-GNN embeddings yield estimators of graphon path-moment features, which are then combined via constrained regression to recover edge probabilities $W_{n,i,j}=\rho_n W(\omega_i,\omega_j)$. They prove consistency of moment estimators, derive finite-sample rates, and show that edge ranking can be preserved with faster convergence than full graphon estimation, including guarantees in stochastic block models. They also demonstrate limitations of naive GCN architectures with random initialization, emphasize the identifiability challenges of single-layer dot-product embeddings, and provide empirical results on real (Cora) and synthetic graph models, showing LG-GNN’s competitiveness and speed without heavy tuning. Overall, the paper advances understanding of when and how GNNs can provably recover underlying edge probabilities and maintain correct edge rankings under graphon-based graph generation, with practical implications for link prediction in sparse and dense regimes.

Abstract

This paper derives statistical guarantees for the performance of Graph Neural Networks (GNNs) in link prediction tasks on graphs generated by a graphon. We propose a linear GNN architecture (LG-GNN) that produces consistent estimators for the underlying edge probabilities. We establish a bound on the mean squared error and give guarantees on the ability of LG-GNN to detect high-probability edges. Our guarantees hold for both sparse and dense graphs. Finally, we demonstrate some of the shortcomings of the classical GCN architecture, as well as verify our results on real and synthetic datasets.

Statistical Guarantees for Link Prediction using Graph Neural Networks

TL;DR

This work provides statistical guarantees for link prediction on graphs generated by graphons using a linear Graphon GNN (LG-GNN). The authors introduce a two-stage pipeline: LG-GNN embeddings yield estimators of graphon path-moment features, which are then combined via constrained regression to recover edge probabilities . They prove consistency of moment estimators, derive finite-sample rates, and show that edge ranking can be preserved with faster convergence than full graphon estimation, including guarantees in stochastic block models. They also demonstrate limitations of naive GCN architectures with random initialization, emphasize the identifiability challenges of single-layer dot-product embeddings, and provide empirical results on real (Cora) and synthetic graph models, showing LG-GNN’s competitiveness and speed without heavy tuning. Overall, the paper advances understanding of when and how GNNs can provably recover underlying edge probabilities and maintain correct edge rankings under graphon-based graph generation, with practical implications for link prediction in sparse and dense regimes.

Abstract

This paper derives statistical guarantees for the performance of Graph Neural Networks (GNNs) in link prediction tasks on graphs generated by a graphon. We propose a linear GNN architecture (LG-GNN) that produces consistent estimators for the underlying edge probabilities. We establish a bound on the mean squared error and give guarantees on the ability of LG-GNN to detect high-probability edges. Our guarantees hold for both sparse and dense graphs. Finally, we demonstrate some of the shortcomings of the classical GCN architecture, as well as verify our results on real and synthetic datasets.
Paper Structure (62 sections, 29 theorems, 237 equations, 1 figure, 16 tables, 4 algorithms)

This paper contains 62 sections, 29 theorems, 237 equations, 1 figure, 16 tables, 4 algorithms.

Key Result

Proposition 4.1

Suppose that the graph $G_n=([n],E_n)$ is generated according to a graphon $W_n=\rho_nW$. Suppose that assumptions (asp2) and (asp3) hold. Then, with probability at least $1 - 5/n - n \cdot \rm{exp}(-\delta_W \rho_n(n-1)/3)$, for all $2 \leq k \leq L+2,$ where $a_k =C (8(k+2))^k k^{k+1}\sqrt{k!}$ and $C$ is some absolute constant.

Figures (1)

  • Figure 1: Plot of the predicted probabilities by the PLS Regression (row 1), GCN without node features (row 2), and GCN with node features (row 3). The left column shows for 2 layers, the right column shows for 4 layers.

Theorems & Definitions (50)

  • Proposition 4.1
  • Proposition 4.2
  • Definition 4.3: MSE error
  • Theorem 4.4: Main Theorem
  • Proposition 4.5: Informal
  • Proposition 5.1
  • Proposition 5.2
  • Proposition 6.1
  • Example 6.2
  • Lemma 1.1
  • ...and 40 more