Table of Contents
Fetching ...

Fairness in Ranking: Robustness through Randomization without the Protected Attribute

Andrii Kliachkin, Eleni Psaroudaki, Jakub Marecek, Dimitris Fotakis

TL;DR

The paper tackles fairness in ranking when protected attributes may be unavailable, addressing the challenge of robustness across multiple fairness notions. It introduces a randomized post-processing approach using Mallows noise to produce approximately $P$-fair rankings without demographic data, and an ILP to optimize $DCG$/$NDCG$ under $(\\vec{\\alpha},\\vec{\\beta})$ fairness constraints when scores are known. Through extensive experiments on synthetic data and the German Credit dataset, the approach demonstrates robustness to unknown attributes while maintaining competitive ranking utility, illustrating the method's practical value for HR, advertising, and recommender systems. Overall, the work advances privacy-preserving, fairness-aware ranking by leveraging attribute-agnostic noise and exact constraint-based optimization.

Abstract

There has been great interest in fairness in machine learning, especially in relation to classification problems. In ranking-related problems, such as in online advertising, recommender systems, and HR automation, much work on fairness remains to be done. Two complications arise: first, the protected attribute may not be available in many applications. Second, there are multiple measures of fairness of rankings, and optimization-based methods utilizing a single measure of fairness of rankings may produce rankings that are unfair with respect to other measures. In this work, we propose a randomized method for post-processing rankings, which do not require the availability of the protected attribute. In an extensive numerical study, we show the robustness of our methods with respect to P-Fairness and effectiveness with respect to Normalized Discounted Cumulative Gain (NDCG) from the baseline ranking, improving on previously proposed methods.

Fairness in Ranking: Robustness through Randomization without the Protected Attribute

TL;DR

The paper tackles fairness in ranking when protected attributes may be unavailable, addressing the challenge of robustness across multiple fairness notions. It introduces a randomized post-processing approach using Mallows noise to produce approximately -fair rankings without demographic data, and an ILP to optimize / under fairness constraints when scores are known. Through extensive experiments on synthetic data and the German Credit dataset, the approach demonstrates robustness to unknown attributes while maintaining competitive ranking utility, illustrating the method's practical value for HR, advertising, and recommender systems. Overall, the work advances privacy-preserving, fairness-aware ranking by leveraging attribute-agnostic noise and exact constraint-based optimization.

Abstract

There has been great interest in fairness in machine learning, especially in relation to classification problems. In ranking-related problems, such as in online advertising, recommender systems, and HR automation, much work on fairness remains to be done. Two complications arise: first, the protected attribute may not be available in many applications. Second, there are multiple measures of fairness of rankings, and optimization-based methods utilizing a single measure of fairness of rankings may produce rankings that are unfair with respect to other measures. In this work, we propose a randomized method for post-processing rankings, which do not require the availability of the protected attribute. In an extensive numerical study, we show the robustness of our methods with respect to P-Fairness and effectiveness with respect to Normalized Discounted Cumulative Gain (NDCG) from the baseline ranking, improving on previously proposed methods.
Paper Structure (24 sections, 13 equations, 7 figures, 1 table, 1 algorithm)

This paper contains 24 sections, 13 equations, 7 figures, 1 table, 1 algorithm.

Figures (7)

  • Figure 1: Mallow's distribution and Infeasible Index. Each subplot corresponds to a different value of the central's ranking Infeasible Index. The Infeasible Index of the central ranking is shown as a red line. The bar plots depict the mean value of the Infeasible Index of the samples from the Mallows distribution centered on the initial ranking with two groups. Confidence intervals were obtained via bootstrapping ($n=1000$).
  • Figure 2: The Infeasible Index of the Central Ranking, as constructed by sampling from score distributions for each of the two groups (\ref{['sec:uniform-exp']}). Specifically, the x-axis depicts the difference in means between the score distributions of the two groups. Confidence intervals were obtained via bootstrapping ($n=1000$).
  • Figure 3: Mallow's distribution and Infeasible Index. Each subplot corresponds to a difference in means between the score distributions of the two groups. We sample five individuals for each group, where the candidates in the first group are assigned scores $S_1 \sim \mathcal{U}(0,1)$, and in the second group - $S_2 \sim \mathcal{U}(0 + \delta, 1 + \delta)$, where $\delta$ is the difference in means. The subplots depict the mean value of the Infeasible Index of the samples from the Mallows distribution centered on the initial ranking. Confidence intervals were obtained via bootstrapping ($n=1000$).
  • Figure 4: Mallow's distribution and NDCG. Each subplot corresponds to a difference in means between the score distributions of the two groups. We sample five individuals for each group, where the candidates in the first group are assigned scores $S_1 \sim \mathcal{U}(0,1)$, and in the second group - $S_2 \sim \mathcal{U}(0 + \delta, 1 + \delta)$, where $\delta$ is the difference in means. The subplots depict the mean value of NDCG of the samples from the Mallows distribution centered on the initial ranking. Confidence intervals were obtained via bootstrapping ($n=1000$).
  • Figure 5: Rankings constructed with noisy representation constraints on the combined $Age-Sex$ protected attribute from an initial weakly-p-fair ranking with respect to the combined $Age-Sex$ protected attribute. The plots show the median percentage of positions satisfying P-fairness w.r.t. the $Age-Sex$ protected attribute. Confidence intervals were obtained via bootstrapping ($n=1000$). In Subfigure (a) the $\theta$ parameter of the Mallows distribution is set to $0.5$, and no noise is added to the constraints. In Subfigure (b) $\theta=1$ and no noise is added to the constraints. In Subfigure (c) $\theta=0.5$ and Gaussian noise $\xi\sim \mathcal{N}(0,1)$ is added to the constraints. In Subfigure (d) $\theta=1$ and Gaussian noise $\xi\sim \mathcal{N}(0,1)$ is added to the constraints.
  • ...and 2 more figures

Theorems & Definitions (4)

  • Definition 1: ($\vec{\alpha}, \vec{\beta}$)-$k$ fair ranking, Def. 2.4 of ChakrabortyD0S22
  • Definition 2: ($\vec{\alpha}, \vec{\beta}$)-weak $k$-fair ranking, Def. 2.5 of ChakrabortyD0S22
  • Definition 3: Two-Sided Infeasible Index
  • Definition 4: Percentage of P-Fair Positions