Table of Contents
Fetching ...

Conformal Prediction: A Theoretical Note and Benchmarking Transductive Node Classification in Graphs

Pranav Maneriker, Aditya T. Vadlamani, Anutam Srinivasan, Yuntian He, Ali Payani, Srinivasan Parthasarathy

TL;DR

Conformal prediction (CP) provides statistically valid prediction sets with coverage guarantees for graph-based tasks under exchangeability, enabling uncertainty quantification in node classification. The paper analyzes design choices in graph CP, proves a finite-sample efficiency bound comparing randomized vs deterministic adaptive scores, and develops scalable CFGNN-based methods with batching and caching to handle large graphs. It benchmarks a wide spectrum of CP methods (TPS, TPS-Classwise, APS, RAPS, DAPS, DTPS, NAPS, CFGNN) across diverse datasets, revealing clear trade-offs between efficiency and adaptability and showing randomized APS often improves efficiency, especially with many classes. The work offers practical guidelines for method selection, a Python library for graph CP, and avenues for future work such as fairness auditing and non-IID uncertainty considerations.

Abstract

Conformal prediction has become increasingly popular for quantifying the uncertainty associated with machine learning models. Recent work in graph uncertainty quantification has built upon this approach for conformal graph prediction. The nascent nature of these explorations has led to conflicting choices for implementations, baselines, and method evaluation. In this work, we analyze the design choices made in the literature and discuss the tradeoffs associated with existing methods. Building on the existing implementations, we introduce techniques to scale existing methods to large-scale graph datasets without sacrificing performance. Our theoretical and empirical results justify our recommendations for future scholarship in graph conformal prediction.

Conformal Prediction: A Theoretical Note and Benchmarking Transductive Node Classification in Graphs

TL;DR

Conformal prediction (CP) provides statistically valid prediction sets with coverage guarantees for graph-based tasks under exchangeability, enabling uncertainty quantification in node classification. The paper analyzes design choices in graph CP, proves a finite-sample efficiency bound comparing randomized vs deterministic adaptive scores, and develops scalable CFGNN-based methods with batching and caching to handle large graphs. It benchmarks a wide spectrum of CP methods (TPS, TPS-Classwise, APS, RAPS, DAPS, DTPS, NAPS, CFGNN) across diverse datasets, revealing clear trade-offs between efficiency and adaptability and showing randomized APS often improves efficiency, especially with many classes. The work offers practical guidelines for method selection, a Python library for graph CP, and avenues for future work such as fairness auditing and non-IID uncertainty considerations.

Abstract

Conformal prediction has become increasingly popular for quantifying the uncertainty associated with machine learning models. Recent work in graph uncertainty quantification has built upon this approach for conformal graph prediction. The nascent nature of these explorations has led to conflicting choices for implementations, baselines, and method evaluation. In this work, we analyze the design choices made in the literature and discuss the tradeoffs associated with existing methods. Building on the existing implementations, we introduce techniques to scale existing methods to large-scale graph datasets without sacrificing performance. Our theoretical and empirical results justify our recommendations for future scholarship in graph conformal prediction.
Paper Structure (39 sections, 8 theorems, 34 equations, 20 figures, 7 tables, 2 algorithms)

This paper contains 39 sections, 8 theorems, 34 equations, 20 figures, 7 tables, 2 algorithms.

Key Result

Theorem 2.1

Suppose $\{({\bm{x}}_i, y_i)\}_{i=1}^{n+1}$ are exchangeable, $s: {\mathcal{X}} \times {\mathcal{Y}} \rightarrow \mathbb{R}$ is a score function measuring the non-conformity of point $({\bm{x}}, y)$, with higher scores indicating lower conformity, and a target miscoverage level $\alpha \in [0, 1]$. The upper bound assumes that the scores are unique or that a suitably random tie-breaking method ex

Figures (20)

  • Figure 1: We set the target coverage rate $\alpha = 0.1$. The boxplots present the Label Stratified Coverage for (a) Amazon_Computers and (b) ogbn-arxiv for both the FS split (left) and LC split (right). We want the means (white triangle) to be around $1 - \alpha = 0.9$. For Labeled Stratified Coverage, TPS-Classwise is comparable to or better than TPS.
  • Figure 2: Label Distribution for ogbn-arxiv
  • Figure 3: Scores for the Cora dataset using the randomized and non-randomized versions of APS. In the left plot, the vertical lines show the shift in the standard conformal quantiles for $A$ (randomized APS) and $\Tilde{A}$ (non-randomized APS) for $0.9$ coverage. In the right plot, the vertical lines show the shift in the $1-\alpha_c$ value for $A$ and $\Tilde{A}$ using scores for the incorrect classes. We have $(1 - \alpha_c^{\Tilde{A}}) - (1 - \alpha_c^A) \gg \frac{2}{n + 1}\iff \alpha_c^A - \alpha_c^{\Tilde{A}} \gg \frac{2}{n + 1}$ which satisfies the condition for Theorem \ref{['them:APS:efficiency']}. Thus, $A$ is more efficient as seen in the left plot since $q_A < q_{\Tilde{A}}$.
  • Figure 4: We set the target coverage rate $\alpha = 0.1$. Box plots depicting the efficiencies of APS and Randomized APS across different datasets and multiple runs in both the FS split (left) and LC split (right). Using randomization (the lower box plot for each dataset) consistently improves over the non-randomized version as the efficiencies are distributed around smaller values.
  • Figure 5: The plot on the right replicates an experiment huang2024uncertainty plotting efficiency over various coverage rates for the Cora_ML dataset (a subset of the Cora dataset) for both CFGNN and a baseline model. The plot on the left uses APS with randomization when constructing the final prediction sets. These plots illustrate the benefits of using randomization on baseline performance.
  • ...and 15 more figures

Theorems & Definitions (11)

  • Theorem 2.1: vovk2005algorithmicangelopoulos2021gentle
  • Theorem 2.2: zargarbashi23conformalhuang2024uncertainty
  • Theorem 2.3
  • Corollary 2.3.1
  • Lemma B.0: cf2025a
  • Lemma B.1
  • proof
  • Theorem B.1
  • proof
  • Corollary B.1.1
  • ...and 1 more