Table of Contents
Fetching ...

Unmasking Gender Bias in Recommendation Systems and Enhancing Category-Aware Fairness

Tahsin Alamgir Kheya, Mohamed Reda Bouadjenek, Sunil Aryal

TL;DR

This work addresses gender bias in recommender systems by introducing category-aware, ranking-sensitive fairness metrics that quantify how recommendations distribute across item genres for different genders. It defines non-ranking metrics (Category Coverage and Relative Category Representation) and ranking-based metrics (CMAP, CDCG, CMRR, CRP), supported by a Gender Balance Score to measure consumer-side fairness, and provides a formal notation framework. The authors propose a Genre Aware Regularization term, integrated into the loss of multiple backbone models as $\ L = \alpha L_{FairGenreGender} + (1-\alpha) L_{Recommendation}$, with $L_{FairGenreGender}$ derived from category-level disparities and scaled via a Sigmoid function around 0.5, enabling category-aware fairness during training. Experiments on three real-world datasets (ML-100K, ML-1M, Yelp) show that the proposed metrics reveal biases not captured by traditional measures, and that the regularizer significantly reduces category-level bias with minimal degradation in overall recommendation performance, especially for complex models like NeuMF. The findings suggest a practical, adaptable framework for evaluating and improving fairness in recommendations that can extend to multi-valued sensitive attributes and provider-side fairness contexts, delivering more nuanced, actionable insights for deploying fair RS in diverse domains.

Abstract

Recommendation systems are now an integral part of our daily lives. We rely on them for tasks such as discovering new movies, finding friends on social media, and connecting job seekers with relevant opportunities. Given their vital role, we must ensure these recommendations are free from societal stereotypes. Therefore, evaluating and addressing such biases in recommendation systems is crucial. Previous work evaluating the fairness of recommended items fails to capture certain nuances as they mainly focus on comparing performance metrics for different sensitive groups. In this paper, we introduce a set of comprehensive metrics for quantifying gender bias in recommendations. Specifically, we show the importance of evaluating fairness on a more granular level, which can be achieved using our metrics to capture gender bias using categories of recommended items like genres for movies. Furthermore, we show that employing a category-aware fairness metric as a regularization term along with the main recommendation loss during training can help effectively minimize bias in the models' output. We experiment on three real-world datasets, using five baseline models alongside two popular fairness-aware models, to show the effectiveness of our metrics in evaluating gender bias. Our metrics help provide an enhanced insight into bias in recommended items compared to previous metrics. Additionally, our results demonstrate how incorporating our regularization term significantly improves the fairness in recommendations for different categories without substantial degradation in overall recommendation performance.

Unmasking Gender Bias in Recommendation Systems and Enhancing Category-Aware Fairness

TL;DR

This work addresses gender bias in recommender systems by introducing category-aware, ranking-sensitive fairness metrics that quantify how recommendations distribute across item genres for different genders. It defines non-ranking metrics (Category Coverage and Relative Category Representation) and ranking-based metrics (CMAP, CDCG, CMRR, CRP), supported by a Gender Balance Score to measure consumer-side fairness, and provides a formal notation framework. The authors propose a Genre Aware Regularization term, integrated into the loss of multiple backbone models as , with derived from category-level disparities and scaled via a Sigmoid function around 0.5, enabling category-aware fairness during training. Experiments on three real-world datasets (ML-100K, ML-1M, Yelp) show that the proposed metrics reveal biases not captured by traditional measures, and that the regularizer significantly reduces category-level bias with minimal degradation in overall recommendation performance, especially for complex models like NeuMF. The findings suggest a practical, adaptable framework for evaluating and improving fairness in recommendations that can extend to multi-valued sensitive attributes and provider-side fairness contexts, delivering more nuanced, actionable insights for deploying fair RS in diverse domains.

Abstract

Recommendation systems are now an integral part of our daily lives. We rely on them for tasks such as discovering new movies, finding friends on social media, and connecting job seekers with relevant opportunities. Given their vital role, we must ensure these recommendations are free from societal stereotypes. Therefore, evaluating and addressing such biases in recommendation systems is crucial. Previous work evaluating the fairness of recommended items fails to capture certain nuances as they mainly focus on comparing performance metrics for different sensitive groups. In this paper, we introduce a set of comprehensive metrics for quantifying gender bias in recommendations. Specifically, we show the importance of evaluating fairness on a more granular level, which can be achieved using our metrics to capture gender bias using categories of recommended items like genres for movies. Furthermore, we show that employing a category-aware fairness metric as a regularization term along with the main recommendation loss during training can help effectively minimize bias in the models' output. We experiment on three real-world datasets, using five baseline models alongside two popular fairness-aware models, to show the effectiveness of our metrics in evaluating gender bias. Our metrics help provide an enhanced insight into bias in recommended items compared to previous metrics. Additionally, our results demonstrate how incorporating our regularization term significantly improves the fairness in recommendations for different categories without substantial degradation in overall recommendation performance.

Paper Structure

This paper contains 35 sections, 13 equations, 5 figures, 3 tables.

Figures (5)

  • Figure 1: Comparison of action and romance movie recommendations among male and female users across four recommendation algorithms, along with corresponding Precision@10 values. The graphs highlight disparities in genre recommendations by gender, with romance movies being more frequently suggested to female users and action movies to male users, despite similar Precision@10 metrics.
  • Figure 2: Comparison of Bias values for six of our metrics for all three dataset.
  • Figure 3: Reduction in bias scores after using our fairness-aware regularizer. Since all datasets provide similar outcomes, we present the results for only the ML 100K dataset.
  • Figure 4: Bias score for the fairness-aware models over four stereotypical genres for the ML 100K dataset.
  • Figure 5: Impact of $\alpha$ on recommendation performance wrt NDCG@50 and the bias measure which is the difference of Category Coverage values for male and female. The experiments are performed for all three datasets