Break Out of a Pigeonhole: A Unified Framework for Examining Miscalibration, Bias, and Stereotype in Recommender Systems
Yongsu Ahn, Yu-Ru Lin
TL;DR
This work tackles miscalibration, bias, and stereotyping in recommender systems by proposing a unified framework that decomposes miscalibration into bias and variance and introduces system-induced effects such as stereotype and inflated diversity. Using MovieLens 1M and five algorithms, the authors show that complex models achieve better item-level accuracy but worse category-level calibration, while simpler models exaggerate stereotypes; miscalibration and biases disproportionately affect women, older users, and atypical users. They further employ structural equation modeling to map relationships among user characteristics, system-induced effects, and miscalibration, and demonstrate that oversampling underrepresented groups can mitigate stereotypes and improve calibration and quality, albeit with trade-offs. The work provides a principled toolkit for diagnosing and mitigating representation-related harms in recommender systems and highlights the importance of addressing data underrepresentation.
Abstract
Despite the benefits of personalizing items and information tailored to users' needs, it has been found that recommender systems tend to introduce biases that favor popular items or certain categories of items, and dominant user groups. In this study, we aim to characterize the systematic errors of a recommendation system and how they manifest in various accountability issues, such as stereotypes, biases, and miscalibration. We propose a unified framework that distinguishes the sources of prediction errors into a set of key measures that quantify the various types of system-induced effects, both at the individual and collective levels. Based on our measuring framework, we examine the most widely adopted algorithms in the context of movie recommendation. Our research reveals three important findings: (1) Differences between algorithms: recommendations generated by simpler algorithms tend to be more stereotypical but less biased than those generated by more complex algorithms. (2) Disparate impact on groups and individuals: system-induced biases and stereotypes have a disproportionate effect on atypical users and minority groups (e.g., women and older users). (3) Mitigation opportunity: using structural equation modeling, we identify the interactions between user characteristics (typicality and diversity), system-induced effects, and miscalibration. We further investigate the possibility of mitigating system-induced effects by oversampling underrepresented groups and individuals, which was found to be effective in reducing stereotypes and improving recommendation quality. Our research is the first systematic examination of not only system-induced effects and miscalibration but also the stereotyping issue in recommender systems.
