Table of Contents
Fetching ...

FairJob: A Real-World Dataset for Fairness in Online Systems

Mariia Vladimirova, Federico Pavone, Eustache Diemert

TL;DR

This work introduces a fairness-aware dataset for job recommendations in advertising, designed to foster research in algorithmic fairness within real-world scenarios, and introduces a method to compute a fair utility metric for the job recommendations in online systems case from a biased dataset.

Abstract

We introduce a fairness-aware dataset for job recommendations in advertising, designed to foster research in algorithmic fairness within real-world scenarios. It was collected and prepared to comply with privacy standards and business confidentiality. An additional challenge is the lack of access to protected user attributes such as gender, for which we propose a solution to obtain a proxy estimate. Despite being anonymized and including a proxy for a sensitive attribute, our dataset preserves predictive power and maintains a realistic and challenging benchmark. This dataset addresses a significant gap in the availability of fairness-focused resources for high-impact domains like advertising -- the actual impact being having access or not to precious employment opportunities, where balancing fairness and utility is a common industrial challenge. We also explore various stages in the advertising process where unfairness can occur and introduce a method to compute a fair utility metric for the job recommendations in online systems case from a biased dataset. Experimental evaluations of bias mitigation techniques on the released dataset demonstrate potential improvements in fairness and the associated trade-offs with utility. The dataset is hosted at https://huggingface.co/datasets/criteo/FairJob. Source code for the experiments is hosted at https://github.com/criteo-research/FairJob-dataset/.

FairJob: A Real-World Dataset for Fairness in Online Systems

TL;DR

This work introduces a fairness-aware dataset for job recommendations in advertising, designed to foster research in algorithmic fairness within real-world scenarios, and introduces a method to compute a fair utility metric for the job recommendations in online systems case from a biased dataset.

Abstract

We introduce a fairness-aware dataset for job recommendations in advertising, designed to foster research in algorithmic fairness within real-world scenarios. It was collected and prepared to comply with privacy standards and business confidentiality. An additional challenge is the lack of access to protected user attributes such as gender, for which we propose a solution to obtain a proxy estimate. Despite being anonymized and including a proxy for a sensitive attribute, our dataset preserves predictive power and maintains a realistic and challenging benchmark. This dataset addresses a significant gap in the availability of fairness-focused resources for high-impact domains like advertising -- the actual impact being having access or not to precious employment opportunities, where balancing fairness and utility is a common industrial challenge. We also explore various stages in the advertising process where unfairness can occur and introduce a method to compute a fair utility metric for the job recommendations in online systems case from a biased dataset. Experimental evaluations of bias mitigation techniques on the released dataset demonstrate potential improvements in fairness and the associated trade-offs with utility. The dataset is hosted at https://huggingface.co/datasets/criteo/FairJob. Source code for the experiments is hosted at https://github.com/criteo-research/FairJob-dataset/.
Paper Structure (59 sections, 6 equations, 12 figures, 10 tables)

This paper contains 59 sections, 6 equations, 12 figures, 10 tables.

Figures (12)

  • Figure 1: Simplified scheme of online advertising process of ad selection: (i) user enters a webpage with available banner for an ad, (ii) webpage sends a request to participate in the real-time bidding auction which triggers campaign selection by an ad service for a given user, (iii) after the campaign is chosen, ad-service sends a bid proposition, (iv) if the proposed bid won the auction, the recommendation engine chooses the best ad from the chosen campaign and shows it on the webpage.
  • Figure 2: Causal graph depicting effects of variables appearing during model training under different constraints. The arrow between the nodes corresponds to the causal effect. The dashed arrow between $\hat{Y}$ and $X$ can be interpreted as $\hat{Y}$ depends on $X$, but conditionally on $X$, $\hat{Y}$ is independent to its ancestors.
  • Figure 3: Causal graph depicting effects of variables appearing during model training for an ad recommendation system under different constraints.
  • Figure 4: Examples of some feature statistics in FairJob dataset: number of impressions per user and banner size have long tail phenomenon (two plots on the left). The products have popularity bias (right plot), i.e. some products have much higher or lower than average number of clicks with senior job ads having more clicks on average.
  • Figure 5: Probability density distributions of click for different values of the protected attribute of three models trained in different ways: (i) unfair -- with a protected attribute included as a feature during training, (ii) unaware -- corresponds to fairness through unawareness, (iii) trained with fairness penalty as a bias mitigation technique.
  • ...and 7 more figures