Uncertainty Calibration for Counterfactual Propensity Estimation in Recommendation
Wenbo Hu, Xin Sun, Qiang liu, Le Wu, Liang Wang
TL;DR
The study tackles MNAR-induced bias in CVR prediction by exposing propensity-score miscalibration and introducing a model-agnostic uncertainty-calibration framework guided by Expected Calibration Error (ECE). It combines uncertainty quantification (e.g., MC-Dropout, Deep Ensembles, Dual Focal Loss) with post-processing calibration (Platt scaling) to produce calibrated propensity scores, supported by theoretical bias/generalization analyses. Empirical results on Coat Shopping, Yahoo! R3, and KuaiRand show that calibrated propensities reduce ECE and improve IPS- and DR-based CVR predictions, often surpassing SOTA debiasing methods. The work demonstrates that uncertainty calibration is crucial for reliable, unbiased recommendations in MNAR settings and offers practical guidance on calibration choices and efficiency considerations.
Abstract
Post-click conversion rate (CVR) is a reliable indicator of online customers' preferences, making it crucial for developing recommender systems. A major challenge in predicting CVR is severe selection bias, arising from users' inherent self-selection behavior and the system's item selection process. To mitigate this issue, the inverse propensity score (IPS) is employed to weight the prediction error of each observed instance. However, current propensity score estimations are unreliable due to the lack of a quality measure. To address this, we evaluate the quality of propensity scores from the perspective of uncertainty calibration, proposing the use of Expected Calibration Error (ECE) as a measure of propensity-score quality, which quantifies the extent to which predicted probabilities are overconfident by assessing the difference between predicted probabilities and actual observed frequencies. Miscalibrated propensity scores can lead to distorted IPS weights, thereby compromising the debiasing process in CVR prediction. In this paper, we introduce a model-agnostic calibration framework for propensity-based debiasing of CVR predictions. Theoretical analysis on bias and generalization bounds demonstrates the superiority of calibrated propensity estimates over uncalibrated ones. Experiments conducted on the Coat, Yahoo and KuaiRand datasets show improved uncertainty calibration, as evidenced by lower ECE values, leading to enhanced CVR prediction outcomes.
