Table of Contents
Fetching ...

Exploring How Fair Model Representations Relate to Fair Recommendations

Bjørnar Vassøy, Benjamin Kille, Helge Langseth

Abstract

One of the many fairness definitions pursued in recent recommender system research targets mitigating demographic information encoded in model representations. Models optimized for this definition are typically evaluated on how well demographic attributes can be classified given model representations, with the (implicit) assumption that this measure accurately reflects \textit{recommendation parity}, i.e., how similar recommendations given to different users are. We challenge this assumption by comparing the amount of demographic information encoded in representations with various measures of how the recommendations differ. We propose two new approaches for measuring how well demographic information can be classified given ranked recommendations. Our results from extensive testing of multiple models on one real and multiple synthetically generated datasets indicate that optimizing for fair representations positively affects recommendation parity, but also that evaluation at the representation level is not a good proxy for measuring this effect when comparing models. We also provide extensive insight into how recommendation-level fairness metrics behave for various models by evaluating their performances on numerous generated datasets with different properties.

Exploring How Fair Model Representations Relate to Fair Recommendations

Abstract

One of the many fairness definitions pursued in recent recommender system research targets mitigating demographic information encoded in model representations. Models optimized for this definition are typically evaluated on how well demographic attributes can be classified given model representations, with the (implicit) assumption that this measure accurately reflects \textit{recommendation parity}, i.e., how similar recommendations given to different users are. We challenge this assumption by comparing the amount of demographic information encoded in representations with various measures of how the recommendations differ. We propose two new approaches for measuring how well demographic information can be classified given ranked recommendations. Our results from extensive testing of multiple models on one real and multiple synthetically generated datasets indicate that optimizing for fair representations positively affects recommendation parity, but also that evaluation at the representation level is not a good proxy for measuring this effect when comparing models. We also provide extensive insight into how recommendation-level fairness metrics behave for various models by evaluating their performances on numerous generated datasets with different properties.

Paper Structure

This paper contains 48 sections, 3 equations, 10 figures.

Figures (10)

  • Figure 1: Recommendation AUC plotted against Representation AUC for synthetic datasets with different $\epsilon$ parameters. VAE and VAERel.
  • Figure 2: Gender and Age Recommendation AUC plotted against Representation AUC for the Movielens 1M datasets with different model parameters. VAERel and VAEAfrl*.
  • Figure 3: Gender and Age Recommendation AUC plotted against Representation AUC for the Movielens 1M datasets with different model parameters. VAE2adv and VAEAfrl*.
  • Figure 4: Recommendation AUC plotted against Representation AUC in a synthetic dataset with $\epsilon=0.74$ parameters.
  • Figure 5: Recommendation metrics plotted for different models and dataset $\epsilon$-parameters.
  • ...and 5 more figures