Uncertainty Calibration for Counterfactual Propensity Estimation in Recommendation

Wenbo Hu; Xin Sun; Qiang liu; Le Wu; Liang Wang

Uncertainty Calibration for Counterfactual Propensity Estimation in Recommendation

Wenbo Hu, Xin Sun, Qiang liu, Le Wu, Liang Wang

TL;DR

The study tackles MNAR-induced bias in CVR prediction by exposing propensity-score miscalibration and introducing a model-agnostic uncertainty-calibration framework guided by Expected Calibration Error (ECE). It combines uncertainty quantification (e.g., MC-Dropout, Deep Ensembles, Dual Focal Loss) with post-processing calibration (Platt scaling) to produce calibrated propensity scores, supported by theoretical bias/generalization analyses. Empirical results on Coat Shopping, Yahoo! R3, and KuaiRand show that calibrated propensities reduce ECE and improve IPS- and DR-based CVR predictions, often surpassing SOTA debiasing methods. The work demonstrates that uncertainty calibration is crucial for reliable, unbiased recommendations in MNAR settings and offers practical guidance on calibration choices and efficiency considerations.

Abstract

Post-click conversion rate (CVR) is a reliable indicator of online customers' preferences, making it crucial for developing recommender systems. A major challenge in predicting CVR is severe selection bias, arising from users' inherent self-selection behavior and the system's item selection process. To mitigate this issue, the inverse propensity score (IPS) is employed to weight the prediction error of each observed instance. However, current propensity score estimations are unreliable due to the lack of a quality measure. To address this, we evaluate the quality of propensity scores from the perspective of uncertainty calibration, proposing the use of Expected Calibration Error (ECE) as a measure of propensity-score quality, which quantifies the extent to which predicted probabilities are overconfident by assessing the difference between predicted probabilities and actual observed frequencies. Miscalibrated propensity scores can lead to distorted IPS weights, thereby compromising the debiasing process in CVR prediction. In this paper, we introduce a model-agnostic calibration framework for propensity-based debiasing of CVR predictions. Theoretical analysis on bias and generalization bounds demonstrates the superiority of calibrated propensity estimates over uncalibrated ones. Experiments conducted on the Coat, Yahoo and KuaiRand datasets show improved uncertainty calibration, as evidenced by lower ECE values, leading to enhanced CVR prediction outcomes.

Uncertainty Calibration for Counterfactual Propensity Estimation in Recommendation

TL;DR

Abstract

Paper Structure (29 sections, 5 theorems, 22 equations, 4 figures, 7 tables, 1 algorithm)

This paper contains 29 sections, 5 theorems, 22 equations, 4 figures, 7 tables, 1 algorithm.

Introduction
Preliminaries
Propensity-based Debiasing Recommendation
Trustworthy Machine Learning and Probability Uncertainty for Relibility
Uncertainty Calibration for Deep Learning
Uncertainty Calibration for Propensity Estimation
Propensity Estimation Procedure
Uncertainty Calibration for Propensity Estimation
Uncertainty Probability Quantification for Propensity scores
Post-processing Calibration for Propensity scores
Computational Complexity Analysis
Theoretical Analysis of Uncertainty Calibration using Expected Calibration Errors
Experiments
Experimental Setting
Calibration Results of propensity scores(RQ1)
...and 14 more sections

Key Result

Lemma 1

Given inverse propensities of all user-item pairs $\hat{p}_{u,i}$, the bias of the IPS estimator in Eqn. eqn:IPS and the propensity bias are:

Figures (4)

Figure 1: For recommendation with MNAR on the Coat shopping dataset, we use the raw propensity estimator with and without the platt scaling calibration and give the scatter plot of the expected propensity vs the fraction of observed ratings. The diagonal line is the perfect uncertainty calibration result. As can be seen, the raw propensity estimations are severely miscalibrated.
Figure 2: Calibration Curve and Propensity Histograms of Calibrated propensity scores on the Coat Shopping Dataset
Figure 3: The relationship between ECE and recommendation metrics.
Figure 4: Calibration Curve and Propensity Histogram of Calibrated Propensity scores on the Yahoo! R3 Shopping Dataset

Theorems & Definitions (9)

Lemma 1
Theorem 1
proof
Corollary 1
proof
Theorem 2
proof
Corollary 2
proof

Uncertainty Calibration for Counterfactual Propensity Estimation in Recommendation

TL;DR

Abstract

Uncertainty Calibration for Counterfactual Propensity Estimation in Recommendation

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (4)

Theorems & Definitions (9)