Calibrating the Predictions for Top-N Recommendations
Masahiro Sato
TL;DR
The paper tackles the problem that top-N calibrated predictions in recommender systems can be miscalibrated even when global calibration appears solid. It introduces $ECE@N$ and rank-discounted $RDECE@N$ to specifically evaluate calibration quality among top-N items and proposes a generic top-N focused calibration optimization (TNF) that groups top-N items by rank and learns rank-aware calibration mappings with rank-dependent weights. The study demonstrates that TNF reduces calibration errors across explicit and implicit datasets and a variety of recommender and calibration models, while baselines trained on all items and recent debiasing approaches may fail or underperform. The findings highlight the importance of rank-aware calibration for top-N recommendations and provide a practical framework to improve the reliability of top-N predictions in real-world systems.
Abstract
Well-calibrated predictions of user preferences are essential for many applications. Since recommender systems typically select the top-N items for users, calibration for those top-N items, rather than for all items, is important. We show that previous calibration methods result in miscalibrated predictions for the top-N items, despite their excellent calibration performance when evaluated on all items. In this work, we address the miscalibration in the top-N recommended items. We first define evaluation metrics for this objective and then propose a generic method to optimize calibration models focusing on the top-N items. It groups the top-N items by their ranks and optimizes distinct calibration models for each group with rank-dependent training weights. We verify the effectiveness of the proposed method for both explicit and implicit feedback datasets, using diverse classes of recommender models.
