Stability and Multigroup Fairness in Ranking with Uncertain Predictions
Siddartha Devic, Aleksandra Korolova, David Kempe, Vatsal Sharan
TL;DR
This work introduces ranking functions that translate uncertainty-aware multiclass predictions into distributions over rankings, and rigorously analyzes their stability and fairness properties. It proves that randomness is essential for achieving meaningful stability and shows that Uncertainty Aware (UA) rankings are anonymous and stable, while also enjoying multigroup fairness guarantees when paired with multiaccurate or multicalibrated predictors. The results establish an interpolation between individual and group fairness through multicalibration, and they quantify the tradeoffs between stability, fairness, and utility, including a practical mixing approach between UA and utility-optimal rankings. Empirical evaluations on real datasets demonstrate that UA rankings are robust to predictive noise and to some extent superior in utility compared to baselines, supporting their practical deployment for fairer, more stable ranking under uncertainty.
Abstract
Rankings are ubiquitous across many applications, from search engines to hiring committees. In practice, many rankings are derived from the output of predictors. However, when predictors trained for classification tasks have intrinsic uncertainty, it is not obvious how this uncertainty should be represented in the derived rankings. Our work considers ranking functions: maps from individual predictions for a classification task to distributions over rankings. We focus on two aspects of ranking functions: stability to perturbations in predictions and fairness towards both individuals and subgroups. Not only is stability an important requirement for its own sake, but -- as we show -- it composes harmoniously with individual fairness in the sense of Dwork et al. (2012). While deterministic ranking functions cannot be stable aside from trivial scenarios, we show that the recently proposed uncertainty aware (UA) ranking functions of Singh et al. (2021) are stable. Our main result is that UA rankings also achieve multigroup fairness through successful composition with multiaccurate or multicalibrated predictors. Our work demonstrates that UA rankings naturally interpolate between group and individual level fairness guarantees, while simultaneously satisfying stability guarantees important whenever machine-learned predictions are used.
