Conformalized Link Prediction on Graph Neural Networks

Tianyi Zhao; Jian Kang; Lu Cheng

Conformalized Link Prediction on Graph Neural Networks

Tianyi Zhao, Jian Kang, Lu Cheng

TL;DR

This work tackles unreliable uncertainty estimates in GNN-based link prediction by introducing conformalized link prediction (CLP), a distribution-free, model-agnostic framework that yields prediction intervals with guaranteed coverage under an inductive, edge-level setting. It leverages conformal prediction via Conformalized Quantile Regression (CQR) and proves an exchangeability condition for calibration and test edges, ensuring that the intervals cover the true link status with probability at least $1-\alpha$ (where $\alpha\in(0,1)$). A key insight is that a graph’s degree distribution following a power-law enhances CP efficiency, which motivates a novel sampling-based method to align the graph with a power-law prior before applying CP, thereby shortening interval lengths. Empirical results on five diverse real-world datasets show that CLP achieves the target coverage while significantly improving efficiency over baseline CP and Bayesian UQ methods, with the approach being backbone-model-agnostic and scalable to large graphs.

Abstract

Graph Neural Networks (GNNs) excel in diverse tasks, yet their applications in high-stakes domains are often hampered by unreliable predictions. Although numerous uncertainty quantification methods have been proposed to address this limitation, they often lack \textit{rigorous} uncertainty estimates. This work makes the first attempt to introduce a distribution-free and model-agnostic uncertainty quantification approach to construct a predictive interval with a statistical guarantee for GNN-based link prediction. We term it as \textit{conformalized link prediction.} Our approach builds upon conformal prediction (CP), a framework that promises to construct statistically robust prediction sets or intervals. We first theoretically and empirically establish a permutation invariance condition for the application of CP in link prediction tasks, along with an exact test-time coverage. Leveraging the important structural information in graphs, we then identify a novel and crucial connection between a graph's adherence to the power law distribution and the efficiency of CP. This insight leads to the development of a simple yet effective sampling-based method to align the graph structure with a power law distribution prior to the standard CP procedure. Extensive experiments demonstrate that for conformalized link prediction, our approach achieves the desired marginal coverage while significantly improving the efficiency of CP compared to baseline methods.

Conformalized Link Prediction on Graph Neural Networks

TL;DR

(where

). A key insight is that a graph’s degree distribution following a power-law enhances CP efficiency, which motivates a novel sampling-based method to align the graph with a power-law prior before applying CP, thereby shortening interval lengths. Empirical results on five diverse real-world datasets show that CLP achieves the target coverage while significantly improving efficiency over baseline CP and Bayesian UQ methods, with the approach being backbone-model-agnostic and scalable to large graphs.

Abstract

Paper Structure (26 sections, 2 theorems, 14 equations, 2 figures, 7 tables, 1 algorithm)

This paper contains 26 sections, 2 theorems, 14 equations, 2 figures, 7 tables, 1 algorithm.

Introduction
Preliminary
Conformalized Link Prediction
Exchangeability and Validity of Conformalized Link Prediction
CQR for Conformalized Link Prediction
Efficiency and Structural Property
Sampling-based CQR for Improved Efficiency
Fitting the power-law distribution
Generating ideal degree sequence with $\widehat{\beta}$-parameterized power-law distribution
Sampling edges from the original graph for a degree distribution that follows the power law
Experiments
Experimental Setup
Datasets
Backbone Models
Evaluation Setup
...and 11 more sections

Key Result

proposition 1

In the described inductive setting for link prediction, where the model has access to all node information and features but only a subset of positive links from training and validation sets during training, the unordered set of the scores $[V_i]_{i=1}^{K+L}$ is fixed, where $|\mathcal{D}_{calib}|=K$

Figures (2)

Figure 1: Simulation study on a semi-synthetic dataset generated from the Amazon Computers dataset shchur2018pitfalls.
Figure 2: Performance of S-CQR for conformalized link prediction under different $\lambda$ on Rochester38 dataset.

Theorems & Definitions (2)

proposition 1
theorem 1

Conformalized Link Prediction on Graph Neural Networks

TL;DR

Abstract

Conformalized Link Prediction on Graph Neural Networks

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (2)

Theorems & Definitions (2)