Table of Contents
Fetching ...

Learn to be Fair without Labels: a Distribution-based Learning Framework for Fair Ranking

Fumian Chen, Hui Fang

TL;DR

The paper tackles fair ranking without fairness labels by introducing a distribution-based fair learning (DLF) framework that uses target exposure distributions $\epsilon^*$ to guide optimization. It separates fairness and relevance models, employs a differentiable KL-divergence loss between system exposure $\epsilon(\pi)$ and $\epsilon^*$, and blends the two with a tunable weight $\alpha$ to manage the fairness-relevance trade-off. Experiments on the Wikipedia data from the TREC fair ranking track show that DLF outperforms state-of-the-art methods like DELTR and FA*IR in fairness (AWRF) while retaining competitive relevance (nDCG), especially when fused with BM25. The approach leverages contextual textual features via Sentence-BERT to enhance fairness modeling and demonstrates stable gradient-based training without explicit fairness labels, offering a scalable and interpretable path for fair ranking in real-world retrieval systems.

Abstract

Ranking algorithms as an essential component of retrieval systems have been constantly improved in previous studies, especially regarding relevance-based utilities. In recent years, more and more research attempts have been proposed regarding fairness in rankings due to increasing concerns about potential discrimination and the issue of echo chamber. These attempts include traditional score-based methods that allocate exposure resources to different groups using pre-defined scoring functions or selection strategies and learning-based methods that learn the scoring functions based on data samples. Learning-based models are more flexible and achieve better performance than traditional methods. However, most of the learning-based models were trained and tested on outdated datasets where fairness labels are barely available. State-of-art models utilize relevance-based utility scores as a substitute for the fairness labels to train their fairness-aware loss, where plugging in the substitution does not guarantee the minimum loss. This inconsistency challenges the model's accuracy and performance, especially when learning is achieved by gradient descent. Hence, we propose a distribution-based fair learning framework (DLF) that does not require labels by replacing the unavailable fairness labels with target fairness exposure distributions. Experimental studies on TREC fair ranking track dataset confirm that our proposed framework achieves better fairness performance while maintaining better control over the fairness-relevance trade-off than state-of-art fair ranking frameworks.

Learn to be Fair without Labels: a Distribution-based Learning Framework for Fair Ranking

TL;DR

The paper tackles fair ranking without fairness labels by introducing a distribution-based fair learning (DLF) framework that uses target exposure distributions to guide optimization. It separates fairness and relevance models, employs a differentiable KL-divergence loss between system exposure and , and blends the two with a tunable weight to manage the fairness-relevance trade-off. Experiments on the Wikipedia data from the TREC fair ranking track show that DLF outperforms state-of-the-art methods like DELTR and FA*IR in fairness (AWRF) while retaining competitive relevance (nDCG), especially when fused with BM25. The approach leverages contextual textual features via Sentence-BERT to enhance fairness modeling and demonstrates stable gradient-based training without explicit fairness labels, offering a scalable and interpretable path for fair ranking in real-world retrieval systems.

Abstract

Ranking algorithms as an essential component of retrieval systems have been constantly improved in previous studies, especially regarding relevance-based utilities. In recent years, more and more research attempts have been proposed regarding fairness in rankings due to increasing concerns about potential discrimination and the issue of echo chamber. These attempts include traditional score-based methods that allocate exposure resources to different groups using pre-defined scoring functions or selection strategies and learning-based methods that learn the scoring functions based on data samples. Learning-based models are more flexible and achieve better performance than traditional methods. However, most of the learning-based models were trained and tested on outdated datasets where fairness labels are barely available. State-of-art models utilize relevance-based utility scores as a substitute for the fairness labels to train their fairness-aware loss, where plugging in the substitution does not guarantee the minimum loss. This inconsistency challenges the model's accuracy and performance, especially when learning is achieved by gradient descent. Hence, we propose a distribution-based fair learning framework (DLF) that does not require labels by replacing the unavailable fairness labels with target fairness exposure distributions. Experimental studies on TREC fair ranking track dataset confirm that our proposed framework achieves better fairness performance while maintaining better control over the fairness-relevance trade-off than state-of-art fair ranking frameworks.
Paper Structure (21 sections, 13 equations, 5 figures, 3 tables)

This paper contains 21 sections, 13 equations, 5 figures, 3 tables.

Figures (5)

  • Figure 1: Value of contextual features and convergence analysis plot. We plot the proposed distribution-based fairness-aware loss during training using gradient descent for 20 epochs with a learning rate = $10e-3$. Two sets of features were used to train the proposed model separately, one with contextual features in red and the other without contextual features in black.
  • Figure 2: DLF+BM25: Fairness and relevance plot by $\alpha \in [0,1]$ with an interval of 0.01. Generally, increasing $\alpha$ results in higher fairness (AWRF) and lower relevance (nDCG), but the increasing/decreasing is not linear and varies by different queries.
  • Figure 3: DLF+BM25: Fairness-Relevance trade-off plot (Evaluation Queries 2021) by the preference parameter $\alpha$. Darker dots indicate larger $\alpha$. Values of $\alpha$ are from 0 to 1 with an interval of 0.01. The large gap on the upper left implies the existence of documents that contribute to fairness but dramatically harm relevance. It also shows the difficulty of managing the trade-off.
  • Figure 4: Fairness performance of DLF by re-ranking different lengths of initial rankings based on the top $k \in [20,1000]$ positions retrieved by BM25 Robertson1993. AWRF@20 is used to plot the fairness boundaries by initial rankings with different lengths. The top line in black is for the evaluation query (2021), and the bottom line in red is for the evaluation query (2022).
  • Figure 5: The relationship between relevance of initial rankings and fairness performance of DLF. We selected two different models, BM25 Robertson1993 (in Red) and RM3 lv2009comparative (in Black), with different retrieval parameters to obtain the samples of initial rankings.