Enhance GNNs with Reliable Confidence Estimation via Adversarial Calibration Learning
Yilong Wang, Jiahao Zhang, Tianxiang Zhao, Suhang Wang
TL;DR
This work tackles the challenge of poorly calibrated GNN predictions in graph-structured data, where global calibration methods fail to ensure reliable confidence across node subgroups. It introduces AdvCali, an adversarial calibration framework that jointly learns node-wise temperature scaling and an adversarial group detector to identify miscalibrated subgroups, guided by a differentiable Group-ECE loss. The method demonstrates strong improvements in both global and subgroup calibration across eight real-world benchmarks and remains effective across different backbones, with ablations confirming the necessity of both the cross-entropy and Group-ECE components. By automatically discovering dataset-specific miscalibration patterns, AdvCali offers robust and scalable confidence estimation for GNNs in practical, high-stakes applications.
Abstract
Despite their impressive predictive performance, GNNs often exhibit poor confidence calibration, i.e., their predicted confidence scores do not accurately reflect true correctness likelihood. This issue raises concerns about their reliability in high-stakes domains such as fraud detection, and risk assessment, where well-calibrated predictions are essential for decision-making. To ensure trustworthy predictions, several GNN calibration methods are proposed. Though they can improve global calibration, our experiments reveal that they often fail to generalize across different node groups, leading to inaccurate confidence in node groups with different degree levels, classes, and local structures. In certain cases, they even degrade calibration compared to the original uncalibrated GNN. To address this challenge, we propose a novel AdvCali framework that adaptively enhances calibration across different node groups. Our method leverages adversarial training to automatically identify mis-calibrated node groups and applies a differentiable Group Expected Calibration Error (ECE) loss term to refine confidence estimation within these groups. This allows the model to dynamically adjust its calibration strategy without relying on dataset-specific prior knowledge about miscalibrated subgroups. Extensive experiments on real-world datasets demonstrate that our approach not only improves global calibration but also significantly enhances calibration within groups defined by feature similarity, topology, and connectivity, outperforming previous methods and demonstrating its effectiveness in practical scenarios.
