Beyond Static Calibration: The Impact of User Preference Dynamics on Calibrated Recommendation
Kun Lin, Masoud Mansoury, Farzad Eskandanian, Milad Sabouri, Bamshad Mobasher
TL;DR
This paper argues that calibration in recommender systems is biased when built on static, long-term user histories, potentially distorting current preferences. It introduces a dynamic calibration framework that integrates calibration into training by identifying and using the most representative time-windowed segments of user profiles, evaluated via a simulation workflow that aggregates recent interactions. Using two distinct datasets, KuaiRec and GoodReads, the study shows that optimal training windows for calibration are domain-dependent and that calibration performance varies across user segments, with more stable preferences yielding better calibration. The work highlights the practical importance of accounting for evolving user tastes in calibration and suggests directions for integrating dynamic calibration with broader metrics like diversity and fairness.
Abstract
Calibration in recommender systems is an important performance criterion that ensures consistency between the distribution of user preference categories and that of recommendations generated by the system. Standard methods for mitigating miscalibration typically assume that user preference profiles are static, and they measure calibration relative to the full history of user's interactions, including possibly outdated and stale preference categories. We conjecture that this approach can lead to recommendations that, while appearing calibrated, in fact, distort users' true preferences. In this paper, we conduct a preliminary investigation of recommendation calibration at a more granular level, taking into account evolving user preferences. By analyzing differently sized training time windows from the most recent interactions to the oldest, we identify the most relevant segment of user's preferences that optimizes the calibration metric. We perform an exploratory analysis with datasets from different domains with distinctive user-interaction characteristics. We demonstrate how the evolving nature of user preferences affects recommendation calibration, and how this effect is manifested differently depending on the characteristics of the data in a given domain. Datasets, codes, and more detailed experimental results are available at: https://github.com/nicolelin13/DynamicCalibrationUMAP.
