Table of Contents
Fetching ...

Fair Pairs: Fairness-Aware Ranking Recovery from Pairwise Comparisons

Georg Ahnert, Antonio Ferrara, Claudia Wagner

TL;DR

The paper studies fairness-aware ranking recovery from pairwise comparisons, addressing biases in both candidate sampling and human judgments. It introduces a group-conditioned evaluation framework, including the Group-Conditioned Weighted Kemeny Distance $D^G_t(r)$ and exposure measures, and evaluates multiple recovery methods. Findings show that Fairness-Aware PageRank generally reduces exposure bias and improves fairness under various sampling schemes, while GNNRank with FA*IR achieves the lowest overall error in some scenarios but can increase group disparities. The work highlights the trade-offs between accuracy and fairness in ranking recovery and provides an open-source Python package to facilitate replication and future research.

Abstract

Pairwise comparisons based on human judgements are an effective method for determining rankings of items or individuals. However, as human biases perpetuate from pairwise comparisons to recovered rankings, they affect algorithmic decision making. In this paper, we introduce the problem of fairness-aware ranking recovery from pairwise comparisons. We propose a group-conditioned accuracy measure which quantifies fairness of rankings recovered from pairwise comparisons. We evaluate the impact of state-of-the-art ranking recovery algorithms and sampling approaches on accuracy and fairness of the recovered rankings, using synthetic and empirical data. Our results show that Fairness-Aware PageRank and GNNRank with FA*IR post-processing effectively mitigate existing biases in pairwise comparisons and improve the overall accuracy of recovered rankings. We highlight limitations and strengths of different approaches, and provide a Python package to facilitate replication and future work on fair ranking recovery from pairwise comparisons.

Fair Pairs: Fairness-Aware Ranking Recovery from Pairwise Comparisons

TL;DR

The paper studies fairness-aware ranking recovery from pairwise comparisons, addressing biases in both candidate sampling and human judgments. It introduces a group-conditioned evaluation framework, including the Group-Conditioned Weighted Kemeny Distance and exposure measures, and evaluates multiple recovery methods. Findings show that Fairness-Aware PageRank generally reduces exposure bias and improves fairness under various sampling schemes, while GNNRank with FA*IR achieves the lowest overall error in some scenarios but can increase group disparities. The work highlights the trade-offs between accuracy and fairness in ranking recovery and provides an open-source Python package to facilitate replication and future research.

Abstract

Pairwise comparisons based on human judgements are an effective method for determining rankings of items or individuals. However, as human biases perpetuate from pairwise comparisons to recovered rankings, they affect algorithmic decision making. In this paper, we introduce the problem of fairness-aware ranking recovery from pairwise comparisons. We propose a group-conditioned accuracy measure which quantifies fairness of rankings recovered from pairwise comparisons. We evaluate the impact of state-of-the-art ranking recovery algorithms and sampling approaches on accuracy and fairness of the recovered rankings, using synthetic and empirical data. Our results show that Fairness-Aware PageRank and GNNRank with FA*IR post-processing effectively mitigate existing biases in pairwise comparisons and improve the overall accuracy of recovered rankings. We highlight limitations and strengths of different approaches, and provide a Python package to facilitate replication and future work on fair ranking recovery from pairwise comparisons.
Paper Structure (22 sections, 5 equations, 6 figures)

This paper contains 22 sections, 5 equations, 6 figures.

Figures (6)

  • Figure 1: Research Setup In this paper, we investigate the effect of sampling individuals for pairwise comparison and of ranking recovery methods, on fairness and accuracy of a recovered ranking.
  • Figure 2: Normative Assumptions Under the we are all equal worldview friedler2021possibility, ground-truth skill score and bias are considered separately. If there are two groups, then the unprivileged group is subject to systemic discrimination, historical bias, or other types of biases. Ground-truth skill scores are independent of group membership, but perceived scores are impacted by group membership.
  • Figure 3: Correlations of Skill Score (higher is better) and Rank (lower is better) by Recovery Method. 400 individuals in 2 equal size groups, compared in 1000 iterations using Oversampling and the BTL model. Left: Ranks recovered with RankCentrality negahban2012iterative. Unprivileged individuals have a higher mean rank because of bias, but are sorted to the extremes of the ranking when Oversampling is applied with RankCentrality. Right: Ranks recovered with Fairness-Aware PageRank tsioutsiouliklis2021fairness. The correlations of both groups overlap, but within-group error is higher.
  • Figure 4: Results after 500 Iterations of Simulated Pairwise Comparisons, by Sampling Approach and Ranking Recovery Method. Medians and ranges of 10 trials. Exposure difference (left) and error difference (center) are group-conditioned measures of fairness, error (right) reflects the whole ranking -- dashed lines indicate optimal values. David's Score and RankCentrality intermittently remove exposure difference when Oversampling is applied. Error difference is minimized by GNNRank and Fairness-Aware PageRank, even under Rank-Based Sampling. Fairness-Aware PageRank over-shoots on the unprivileged group's exposure but achieves best overall accuracy by partially mitigating bias.
  • Figure 5: Post-Processed Results after 500 Iterations of Simulated Pairwise Comparisons, by Sampling Approach and Ranking Recovery Method. Post-processing was performed using the FA*IR algorithm zehlike2017fa with $p=0.6$ and $\alpha = 0.1$. FA*IR is able to effectively limit exposure difference while not over-shooting the way Fairness-Aware PageRank does. The post-processing technique also improves overall accuracy and is able to outperform Fairness-Aware PageRank, in particular if paired with GNNRank for ranking recovery. FA*IR does, in contrast to Fairness-Aware PageRank, negatively impact error difference.
  • ...and 1 more figures

Theorems & Definitions (3)

  • Definition 1: Group-Conditioned Weighted Kemeny Distance
  • Definition 2: Exposure
  • Definition 3: The Bradley-Terry-Luce (BTL) Model