Table of Contents
Fetching ...

Towards Precise Prediction Uncertainty in GNNs: Refining GNNs with Topology-grouping Strategy

Hyunjin Seo, Kyusung Seo, Joonhyung Park, Eunho Yang

TL;DR

This work addresses the miscalibration challenge in graph neural networks by demonstrating that calibration errors cannot be captured by a single trend based solely on neighborhood similarity. It introduces Simi-Mailbox, a post-hoc calibration method that groups nodes by both neighborhood similarity and individual confidence, applying per-group temperatures to calibrate predictions without relying on proximity or connectivity. Through extensive experiments across small, medium, and large graphs, Simi-Mailbox achieves up to 13.79% error reduction in calibration error (ECE) and consistently outperforms strong baselines across various backbones and settings, including heterophilous graphs and self-training. The approach preserves accuracy while improving reliability, offering a scalable and robust solution for uncertainty quantification in GNNs with practical impact on downstream decision-making.

Abstract

Recent advancements in graph neural networks (GNNs) have highlighted the critical need of calibrating model predictions, with neighborhood prediction similarity recognized as a pivotal component. Existing studies suggest that nodes with analogous neighborhood prediction similarity often exhibit similar calibration characteristics. Building on this insight, recent approaches incorporate neighborhood similarity into node-wise temperature scaling techniques. However, our analysis reveals that this assumption does not hold universally. Calibration errors can differ significantly even among nodes with comparable neighborhood similarity, depending on their confidence levels. This necessitates a re-evaluation of existing GNN calibration methods, as a single, unified approach may lead to sub-optimal calibration. In response, we introduce **Simi-Mailbox**, a novel approach that categorizes nodes by both neighborhood similarity and their own confidence, irrespective of proximity or connectivity. Our method allows fine-grained calibration by employing *group-specific* temperature scaling, with each temperature tailored to address the specific miscalibration level of affiliated nodes, rather than adhering to a uniform trend based on neighborhood similarity. Extensive experiments demonstrate the effectiveness of our **Simi-Mailbox** across diverse datasets on different GNN architectures, achieving up to 13.79\% error reduction compared to uncalibrated GNN predictions.

Towards Precise Prediction Uncertainty in GNNs: Refining GNNs with Topology-grouping Strategy

TL;DR

This work addresses the miscalibration challenge in graph neural networks by demonstrating that calibration errors cannot be captured by a single trend based solely on neighborhood similarity. It introduces Simi-Mailbox, a post-hoc calibration method that groups nodes by both neighborhood similarity and individual confidence, applying per-group temperatures to calibrate predictions without relying on proximity or connectivity. Through extensive experiments across small, medium, and large graphs, Simi-Mailbox achieves up to 13.79% error reduction in calibration error (ECE) and consistently outperforms strong baselines across various backbones and settings, including heterophilous graphs and self-training. The approach preserves accuracy while improving reliability, offering a scalable and robust solution for uncertainty quantification in GNNs with practical impact on downstream decision-making.

Abstract

Recent advancements in graph neural networks (GNNs) have highlighted the critical need of calibrating model predictions, with neighborhood prediction similarity recognized as a pivotal component. Existing studies suggest that nodes with analogous neighborhood prediction similarity often exhibit similar calibration characteristics. Building on this insight, recent approaches incorporate neighborhood similarity into node-wise temperature scaling techniques. However, our analysis reveals that this assumption does not hold universally. Calibration errors can differ significantly even among nodes with comparable neighborhood similarity, depending on their confidence levels. This necessitates a re-evaluation of existing GNN calibration methods, as a single, unified approach may lead to sub-optimal calibration. In response, we introduce **Simi-Mailbox**, a novel approach that categorizes nodes by both neighborhood similarity and their own confidence, irrespective of proximity or connectivity. Our method allows fine-grained calibration by employing *group-specific* temperature scaling, with each temperature tailored to address the specific miscalibration level of affiliated nodes, rather than adhering to a uniform trend based on neighborhood similarity. Extensive experiments demonstrate the effectiveness of our **Simi-Mailbox** across diverse datasets on different GNN architectures, achieving up to 13.79\% error reduction compared to uncalibrated GNN predictions.

Paper Structure

This paper contains 42 sections, 12 equations, 12 figures, 23 tables.

Figures (12)

  • Figure 1: Analysis of uncalibrated and calibrated logits via prior works, CaGCN and GATS. The $x$-axis divides nodes into sub-intervals based on neighborhood similarity, while the $y$-axis represents corresponding confidence intervals. Each cell in the heatmap represents the subtraction of the average confidence from the accuracy, with color intensity indicating the magnitude of this discrepancy. Contrary to the uniform assumptions in prior works on neighborhood similarity, the results demonstrate that calibration errors can significantly differ among nodes with comparable neighborhood similarity but different confidence levels. Moreover, prior approaches exhibit sub-optimal calibration across varying neighborhood similarity levels when predictions are extended across confidence intervals.
  • Figure 2: Overall framework of our Simi-Mailbox.
  • Figure 3: Qualitative analysis of our calibration results on CoraFull dataset, compared with CaGCN and GATS. Each cell in the heatmap represents the subtraction of the average confidence of calibrated nodes from the accuracy, with color and intensity indicating the magnitude of this discrepancy. Throughout diverse neighborhood similarity levels, our method facilitates a better reduction in the gap between accuracy and confidence compared to baselines.
  • Figure 4: Hyperparameter sensitivity of scaling factor $\lambda$ and the number of bins $N$ across all benchmark datasets and GNN architectures.
  • Figure 5: Qualitative analysis of our calibration results (right) on the Citeseer dataset, compared with CaGCN (left) and GATS (center). Each cell in the heatmap represents the subtraction of the average confidence of calibrated nodes from the accuracy, with color and intensity indicating the magnitude of this discrepancy.
  • ...and 7 more figures