Conformalized Link Prediction on Graph Neural Networks
Tianyi Zhao, Jian Kang, Lu Cheng
TL;DR
This work tackles unreliable uncertainty estimates in GNN-based link prediction by introducing conformalized link prediction (CLP), a distribution-free, model-agnostic framework that yields prediction intervals with guaranteed coverage under an inductive, edge-level setting. It leverages conformal prediction via Conformalized Quantile Regression (CQR) and proves an exchangeability condition for calibration and test edges, ensuring that the intervals cover the true link status with probability at least $1-\alpha$ (where $\alpha\in(0,1)$). A key insight is that a graph’s degree distribution following a power-law enhances CP efficiency, which motivates a novel sampling-based method to align the graph with a power-law prior before applying CP, thereby shortening interval lengths. Empirical results on five diverse real-world datasets show that CLP achieves the target coverage while significantly improving efficiency over baseline CP and Bayesian UQ methods, with the approach being backbone-model-agnostic and scalable to large graphs.
Abstract
Graph Neural Networks (GNNs) excel in diverse tasks, yet their applications in high-stakes domains are often hampered by unreliable predictions. Although numerous uncertainty quantification methods have been proposed to address this limitation, they often lack \textit{rigorous} uncertainty estimates. This work makes the first attempt to introduce a distribution-free and model-agnostic uncertainty quantification approach to construct a predictive interval with a statistical guarantee for GNN-based link prediction. We term it as \textit{conformalized link prediction.} Our approach builds upon conformal prediction (CP), a framework that promises to construct statistically robust prediction sets or intervals. We first theoretically and empirically establish a permutation invariance condition for the application of CP in link prediction tasks, along with an exact test-time coverage. Leveraging the important structural information in graphs, we then identify a novel and crucial connection between a graph's adherence to the power law distribution and the efficiency of CP. This insight leads to the development of a simple yet effective sampling-based method to align the graph structure with a power law distribution prior to the standard CP procedure. Extensive experiments demonstrate that for conformalized link prediction, our approach achieves the desired marginal coverage while significantly improving the efficiency of CP compared to baseline methods.
