Table of Contents
Fetching ...

Distribution Free Prediction Sets for Node Classification

Jase Clarkson

TL;DR

This work addresses the lack of calibrated uncertainty for node classification in graphs by extending conformal prediction to non-exchangeable data using graph-aware weighting. It introduces Neighbourhood Adaptive Prediction Sets (NAPS), which calibrate prediction sets within a node's local neighborhood and two weighted variants (NAPS-H, NAPS-G) to balance sample size and similarity. The approach comes with theoretical guarantees on coverage and is shown to produce tighter, better-calibrated sets than naive APS across multiple GNN architectures and standard datasets, with supporting analysis in a Stochastic Block Model setting. The method enables reliable inductive uncertainty quantification for graph-structured data and can be extended to heterophilic graphs, weighted/directed networks, and node regression tasks.

Abstract

Graph Neural Networks (GNNs) are able to achieve high classification accuracy on many important real world datasets, but provide no rigorous notion of predictive uncertainty. Quantifying the confidence of GNN models is difficult due to the dependence between datapoints induced by the graph structure. We leverage recent advances in conformal prediction to construct prediction sets for node classification in inductive learning scenarios. We do this by taking an existing approach for conformal classification that relies on \textit{exchangeable} data and modifying it by appropriately weighting the conformal scores to reflect the network structure. We show through experiments on standard benchmark datasets using popular GNN models that our approach provides tighter and better calibrated prediction sets than a naive application of conformal prediction.

Distribution Free Prediction Sets for Node Classification

TL;DR

This work addresses the lack of calibrated uncertainty for node classification in graphs by extending conformal prediction to non-exchangeable data using graph-aware weighting. It introduces Neighbourhood Adaptive Prediction Sets (NAPS), which calibrate prediction sets within a node's local neighborhood and two weighted variants (NAPS-H, NAPS-G) to balance sample size and similarity. The approach comes with theoretical guarantees on coverage and is shown to produce tighter, better-calibrated sets than naive APS across multiple GNN architectures and standard datasets, with supporting analysis in a Stochastic Block Model setting. The method enables reliable inductive uncertainty quantification for graph-structured data and can be extended to heterophilic graphs, weighted/directed networks, and node regression tasks.

Abstract

Graph Neural Networks (GNNs) are able to achieve high classification accuracy on many important real world datasets, but provide no rigorous notion of predictive uncertainty. Quantifying the confidence of GNN models is difficult due to the dependence between datapoints induced by the graph structure. We leverage recent advances in conformal prediction to construct prediction sets for node classification in inductive learning scenarios. We do this by taking an existing approach for conformal classification that relies on \textit{exchangeable} data and modifying it by appropriately weighting the conformal scores to reflect the network structure. We show through experiments on standard benchmark datasets using popular GNN models that our approach provides tighter and better calibrated prediction sets than a naive application of conformal prediction.
Paper Structure (19 sections, 1 theorem, 28 equations, 2 figures, 3 tables)

This paper contains 19 sections, 1 theorem, 28 equations, 2 figures, 3 tables.

Key Result

Lemma 5.1

Assume the test data has the block model structure described above. Let $CG_{\text{APS}}$ be the coverage gap for prediction sets constructed using APS (i.e. calibrating using all available nodes), and $CG_{\text{NAPS}}$ be the coverage gap attained by the unweighted variant of NAPS calibrated among

Figures (2)

  • Figure 1: An illustration of the nodes used for calibrating conformal prediction via an "out the box" application of APS (top panel), which randomly splits the data, and the nodes used in NAPS (bottom panel). NAPS localises the calibration nodes to a neighbourhood of the test node.
  • Figure 2: The coverage and size for the different conformal prediction procedures whilst varying $K$ on the Reddit2 dataset. All the experiments use the GraphSAGE-Mean GNN architecture. Each column shows the median of means over 100 repetitions of the experiment for the given method using the methodology introduced in Section \ref{['sec:e_setup']}. The dashed black line shows the desired coverage level $\alpha = 0.9$.

Theorems & Definitions (1)

  • Lemma 5.1