Confidence Calibration for Recommender Systems and Its Applications

Wonbin Kweon

Confidence Calibration for Recommender Systems and Its Applications

Wonbin Kweon

TL;DR

This dissertation proposes a model calibration framework for recommender systems for estimating accurate confidence in recommendation results based on the learned ranking scores and introduces two real-world applications of confidence on recommendations.

Abstract

Despite the importance of having a measure of confidence in recommendation results, it has been surprisingly overlooked in the literature compared to the accuracy of the recommendation. In this dissertation, I propose a model calibration framework for recommender systems for estimating accurate confidence in recommendation results based on the learned ranking scores. Moreover, I subsequently introduce two real-world applications of confidence on recommendations: (1) Training a small student model by treating the confidence of a big teacher model as additional learning guidance, (2) Adjusting the number of presented items based on the expected user utility estimated with calibrated probability.

Confidence Calibration for Recommender Systems and Its Applications

TL;DR

Abstract

Paper Structure (68 sections, 4 theorems, 59 equations, 28 figures, 10 tables, 2 algorithms)

This paper contains 68 sections, 4 theorems, 59 equations, 28 figures, 10 tables, 2 algorithms.

Introduction
Obtaining Calibrated Probabilities with Personalized Ranking Models
Introduction
Preliminary & Related Work
Personalized Ranking
Calibrated Probability
Calibration Method
Proposed Calibration Method
Revisiting Platt Scaling
Gaussian Calibration
Gamma Calibration
Other Distributions
Monotonicity for Proposed Desiderata
Unbiased Parameter Fitting
Naive Log-loss
...and 53 more sections

Key Result

Proposition II.1

Gaussian calibration $g_{\phi}(s) = \sigma(as^2+bs+c)$ is monotonically increasing only and only if the parameter $a$ and $b$ satisfy the constraint $2as+b < 0$ for $s_{\textup{min}}$ and $s_{\textup{max}}$.

Figures (28)

Figure 2.1: Reliability diagram of each calibration method. Gap denotes the discrepancy between the accuracy and the average calibrated probability for each bin. The grey dashed line is a diagonal function that indicates the ideal reliability line where the blue accuracy bar should meet.
Figure 2.2: Ranking score distributions of negative and positive pairs.
Figure 2.3: Fitted function of each calibration method.
Figure 2.4: ECE with various propensity estimation techniques. In this work, we utilize item popularities to estimate the propensity scores.
Figure 2.5: User degree versus mean top-10 confidence.
...and 23 more figures

Theorems & Definitions (10)

Proposition II.1
proof
Proposition II.2
proof
Proposition II.3
proof
Proposition II.4
proof
Definition 1: Top-$K$ Recommendation
Definition 2: Top-Personalized-$K$ Recommendation

Confidence Calibration for Recommender Systems and Its Applications

TL;DR

Abstract

Confidence Calibration for Recommender Systems and Its Applications

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (28)

Theorems & Definitions (10)