Unmasking Gender Bias in Recommendation Systems and Enhancing Category-Aware Fairness
Tahsin Alamgir Kheya, Mohamed Reda Bouadjenek, Sunil Aryal
TL;DR
This work addresses gender bias in recommender systems by introducing category-aware, ranking-sensitive fairness metrics that quantify how recommendations distribute across item genres for different genders. It defines non-ranking metrics (Category Coverage and Relative Category Representation) and ranking-based metrics (CMAP, CDCG, CMRR, CRP), supported by a Gender Balance Score to measure consumer-side fairness, and provides a formal notation framework. The authors propose a Genre Aware Regularization term, integrated into the loss of multiple backbone models as $\ L = \alpha L_{FairGenreGender} + (1-\alpha) L_{Recommendation}$, with $L_{FairGenreGender}$ derived from category-level disparities and scaled via a Sigmoid function around 0.5, enabling category-aware fairness during training. Experiments on three real-world datasets (ML-100K, ML-1M, Yelp) show that the proposed metrics reveal biases not captured by traditional measures, and that the regularizer significantly reduces category-level bias with minimal degradation in overall recommendation performance, especially for complex models like NeuMF. The findings suggest a practical, adaptable framework for evaluating and improving fairness in recommendations that can extend to multi-valued sensitive attributes and provider-side fairness contexts, delivering more nuanced, actionable insights for deploying fair RS in diverse domains.
Abstract
Recommendation systems are now an integral part of our daily lives. We rely on them for tasks such as discovering new movies, finding friends on social media, and connecting job seekers with relevant opportunities. Given their vital role, we must ensure these recommendations are free from societal stereotypes. Therefore, evaluating and addressing such biases in recommendation systems is crucial. Previous work evaluating the fairness of recommended items fails to capture certain nuances as they mainly focus on comparing performance metrics for different sensitive groups. In this paper, we introduce a set of comprehensive metrics for quantifying gender bias in recommendations. Specifically, we show the importance of evaluating fairness on a more granular level, which can be achieved using our metrics to capture gender bias using categories of recommended items like genres for movies. Furthermore, we show that employing a category-aware fairness metric as a regularization term along with the main recommendation loss during training can help effectively minimize bias in the models' output. We experiment on three real-world datasets, using five baseline models alongside two popular fairness-aware models, to show the effectiveness of our metrics in evaluating gender bias. Our metrics help provide an enhanced insight into bias in recommended items compared to previous metrics. Additionally, our results demonstrate how incorporating our regularization term significantly improves the fairness in recommendations for different categories without substantial degradation in overall recommendation performance.
