Regularizing Fairness in Optimal Policy Learning with Distributional Targets
Anders Bredahl Kock, David Preinerstorfer
TL;DR
This work introduces a flexible framework for fair optimal policy learning when the target is a distributional welfare functional rather than the mean. By formulating a penalized objective $\Omega_{\lambda,\mathcal{F}}(\bm{\delta}) = (1-\lambda)\mathsf{T}(\langle\bm{\delta},\mathcal{F}\rangle) - \lambda \max_{z}\mathsf{S}(\langle\bm{\delta},\mathcal{F}\rangle_z,\langle\bm{\delta},\mathcal{F}\rangle)$, the DM can trade off efficiency and fairness across protected groups, with $\lambda$ selected via data-driven strategies or budgeted fairness. The authors prove regret bounds and consistency for empirical success policies, and extend the methodology to non-discrete covariates, including an interpolation-based method for value function estimation. Numerical experiments, including toy examples and two empirical illustrations (Pennsylvania reemployment bonuses and an entrepreneurship program), demonstrate the practical trade-offs and guide parameter tuning. Overall, the paper provides a principled, generalizable approach to incorporating broad fairness notions into distributional policy targets in observational settings.
Abstract
A decision maker typically (i) incorporates training data to learn about the relative effectiveness of treatments, and (ii) chooses an implementation mechanism that implies an ``optimal'' predicted outcome distribution according to some target functional. Nevertheless, a fairness-aware decision maker may not be satisfied achieving said optimality at the cost of being ``unfair" against a subgroup of the population, in the sense that the outcome distribution in that subgroup deviates too strongly from the overall optimal outcome distribution. We study a framework that allows the decision maker to regularize such deviations, while allowing for a wide range of target functionals and fairness measures to be employed. We establish regret and consistency guarantees for empirical success policies with (possibly) data-driven preference parameters, and provide numerical results. Furthermore, we briefly illustrate the methods in two empirical settings.
