Inference-time Stochastic Ranking with Risk Control

Ruocheng Guo; Jean-François Ton; Yang Liu; Hang Li

Inference-time Stochastic Ranking with Risk Control

Ruocheng Guo, Jean-François Ton, Yang Liu, Hang Li

TL;DR

This work tackles fairness in learning-to-rank by addressing exposure bias in deterministic rankers and the high training cost of stochastic PL-based methods. It proposes Inference-time Stochastic Ranking with Risk Control (ISRR), which builds a Generalized Plackett-Luce (GPL) model atop pre-trained scoring functions and uses distribution-free risk control to guarantee a user-specified utility or fairness level at inference time. ISRR enables principled, finite-sample guarantees on ranking performance through calibration data, employing either p-value (HB) or UCB-based thresholds to select per-position candidate sets, and it interpolates between PL and deterministic ranking via thresholds $\bm{\lambda}$. Empirical results on Yahoo, MSLR-WEB30K, and Istella-S show that ISRR matches or exceeds the utility-fairness performance of existing stochastic methods while dramatically reducing training cost, and it provides finite-sample guarantees on the chosen metrics, making it practical for real-world deployment.

Abstract

Learning to Rank (LTR) methods are vital in online economies, affecting users and item providers. Fairness in LTR models is crucial to allocate exposure proportionally to item relevance. Widely used deterministic LTR models can lead to unfair exposure distribution, especially when items with the same relevance receive slightly different ranking scores. Stochastic LTR models, incorporating the Plackett-Luce (PL) ranking model, address fairness issues but suffer from high training cost. In addition, they cannot provide guarantees on the utility or fairness, which can lead to dramatic degraded utility when optimized for fairness. To overcome these limitations, we propose Inference-time Stochastic Ranking with Risk Control (ISRR), a novel method that performs stochastic ranking at inference time with guanranteed utility or fairness given pretrained scoring functions from deterministic or stochastic LTR models. Comprehensive experimental results on three widely adopted datasets demonstrate that our proposed method achieves utility and fairness comparable to existing stochastic ranking methods with much lower computational cost. In addition, results verify that our method provides finite-sample guarantee on utility and fairness. This advancement represents a significant contribution to the field of stochastic ranking and fair LTR with promising real-world applications.

Inference-time Stochastic Ranking with Risk Control

TL;DR

. Empirical results on Yahoo, MSLR-WEB30K, and Istella-S show that ISRR matches or exceeds the utility-fairness performance of existing stochastic methods while dramatically reducing training cost, and it provides finite-sample guarantees on the chosen metrics, making it practical for real-world deployment.

Abstract

Paper Structure (25 sections, 1 theorem, 28 equations, 6 figures, 6 tables, 1 algorithm)

This paper contains 25 sections, 1 theorem, 28 equations, 6 figures, 6 tables, 1 algorithm.

Introduction
Preliminaries
Methodology
Background: Distribution-free Risk Control
Generalized PL Ranking Model
Risk Control for Generalized PL Model
Threshold Selection via Distribution-free Risk Control.
Distribution-free Risk Control for Ranking.
Experiments
Experimental Setup
Experimental Results
GPL's Trade-off between Utility and Fairness (RQ1-2).
Utility Guarantee and Fairness Improvement (RQ3).
Running Time Results (RQ4).
Impact of Scaling Factor (RQ5).
...and 10 more sections

Key Result

lemma 1

Given a natural number $n$, let $Z_1, Z_2,..., Z_n$ be real-valued independent and identically distributed random variables with cumulative distribution function $F(\cdot)$. Let $F_n$ denote the associated empirical distribution defined by Then $\forall \delta \in (0,1)$, with probability at least $1-\delta$,

Figures (6)

Figure 1: Trade-off between utility and fairness of the proposed GPL ranking model. The x-axis is the threshold $\lambda$, the left (right) y-axis is the risk $R_{{util}}=1-\text{NDCG@5}$ (the disparity $R_{sq-fair}$). Det and Sto are the deterministic and the PL ranking models. The shade is one standard deviation. As $\lambda$ increases, GPL only includes items with higher scores, leading to lower risk and higher disparity. Results with other models are in Fig. \ref{['fig:tradeoff_full']} of Appendix \ref{['sec:detailed_results']}.
Figure 2: Distribution of NDCG@5 achieved by ISRR over $50$ runs with $\lambda$ selected by Hoeffding-Benktus. Results for other datasets, and LightGBM can be found in Appendix \ref{['sec:detailed_results']}. The red (green) vertical line is the desired NDCG@5 level $1-\alpha = 0.9 U^*$. $U^*$ is the NDCG@5 of the pre-trained LTR model.
Figure 3: Impact of the scaling factor $\zeta$ on GPL's effectiveness of the trade-off between utility and fairness. The x-axis is the threshold $\lambda$, the left (right) y-axis is the risk $R_{{util}}=1-\text{NDCG@5}$ (the disparity $R_{sq-fair}$). Det and Sto are the deterministic and the PL ranking models. The shade is one standard deviation.
Figure 4: Complete results for trade-off between utility and fairness achieved by GPL. The x-axis is the threshold $\lambda$, the left (right) y-axis is the risk $R_{{util}}=1-\text{NDCG@5}$ (the disparity $R_{sq-fair}$). Det and Sto are the deterministic and the PL ranking models. The shade is one standard deviation. As $\lambda$ increases, GPL only includes items with higher scores, leading to lower risk and higher disparity.
Figure 5: Distribution of NDCG@5 achieved by ISRR over $50$ runs with $\lambda$ selected by Hoeffding-Benktus. The red (green) vertical line is the desired NDCG@5 level $1-\alpha = 0.9 U^*$. $U^*$ is the NDCG@5 of the deterministic model.
...and 1 more figures

Theorems & Definitions (1)

lemma 1

Inference-time Stochastic Ranking with Risk Control

TL;DR

Abstract

Inference-time Stochastic Ranking with Risk Control

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (6)

Theorems & Definitions (1)