Conf-GNNRec: Quantifying and Calibrating the Prediction Confidence for GNN-based Recommendation Methods
Meng Yan, Cai Xu, Xujing Wang, Ziyu Guan, Wei Zhao, Yuhang Zhou
TL;DR
The paper tackles the problem of overconfident predictions in GNN-based recommenders by formalizing a calibration objective and proposing Conf-GNNRec, a post-calibration framework. It combines a nonlinear, segmented rating calibration with a confidence-penalizing loss to align predicted confidence with observed accuracy, using a formal calibration criterion $ \mathbb{P}(\\hat{y}_{u,i}=y_{u,i} \\mid \\hat{p}_{u,i}=p) = p$. Experiments on Gowalla, Yelp2018, and Amazon-Book show consistent improvements in top-N metrics across baselines (LightGCN, KGAT, MVIN, KGCL) and a measurable reduction in the confidence-accuracy gap, validating the approach. The work contributes a practical pathway toward trustworthy GNN-based recommendations and suggests future work in Bayesian confidence modeling and extending the framework to broader architectures.
Abstract
Recommender systems based on graph neural networks perform well in tasks such as rating and ranking. However, in real-world recommendation scenarios, noise such as user misuse and malicious advertisement gradually accumulates through the message propagation mechanism. Even if existing studies mitigate their effects by reducing the noise propagation weights, the severe sparsity of the recommender system still leads to the low-weighted noisy neighbors being mistaken as meaningful information, and the prediction result obtained based on the polluted nodes is not entirely trustworthy. Therefore, it is crucial to measure the confidence of the prediction results in this highly noisy framework. Furthermore, our evaluation of the existing representative GNN-based recommendation shows that it suffers from overconfidence. Based on the above considerations, we propose a new method to quantify and calibrate the prediction confidence of GNN-based recommendations (Conf-GNNRec). Specifically, we propose a rating calibration method that dynamically adjusts excessive ratings to mitigate overconfidence based on user personalization. We also design a confidence loss function to reduce the overconfidence of negative samples and effectively improve recommendation performance. Experiments on public datasets demonstrate the validity of Conf-GNNRec in prediction confidence and recommendation performance.
