Table of Contents
Fetching ...

Regularizing Fairness in Optimal Policy Learning with Distributional Targets

Anders Bredahl Kock, David Preinerstorfer

TL;DR

This work introduces a flexible framework for fair optimal policy learning when the target is a distributional welfare functional rather than the mean. By formulating a penalized objective $\Omega_{\lambda,\mathcal{F}}(\bm{\delta}) = (1-\lambda)\mathsf{T}(\langle\bm{\delta},\mathcal{F}\rangle) - \lambda \max_{z}\mathsf{S}(\langle\bm{\delta},\mathcal{F}\rangle_z,\langle\bm{\delta},\mathcal{F}\rangle)$, the DM can trade off efficiency and fairness across protected groups, with $\lambda$ selected via data-driven strategies or budgeted fairness. The authors prove regret bounds and consistency for empirical success policies, and extend the methodology to non-discrete covariates, including an interpolation-based method for value function estimation. Numerical experiments, including toy examples and two empirical illustrations (Pennsylvania reemployment bonuses and an entrepreneurship program), demonstrate the practical trade-offs and guide parameter tuning. Overall, the paper provides a principled, generalizable approach to incorporating broad fairness notions into distributional policy targets in observational settings.

Abstract

A decision maker typically (i) incorporates training data to learn about the relative effectiveness of treatments, and (ii) chooses an implementation mechanism that implies an ``optimal'' predicted outcome distribution according to some target functional. Nevertheless, a fairness-aware decision maker may not be satisfied achieving said optimality at the cost of being ``unfair" against a subgroup of the population, in the sense that the outcome distribution in that subgroup deviates too strongly from the overall optimal outcome distribution. We study a framework that allows the decision maker to regularize such deviations, while allowing for a wide range of target functionals and fairness measures to be employed. We establish regret and consistency guarantees for empirical success policies with (possibly) data-driven preference parameters, and provide numerical results. Furthermore, we briefly illustrate the methods in two empirical settings.

Regularizing Fairness in Optimal Policy Learning with Distributional Targets

TL;DR

This work introduces a flexible framework for fair optimal policy learning when the target is a distributional welfare functional rather than the mean. By formulating a penalized objective , the DM can trade off efficiency and fairness across protected groups, with selected via data-driven strategies or budgeted fairness. The authors prove regret bounds and consistency for empirical success policies, and extend the methodology to non-discrete covariates, including an interpolation-based method for value function estimation. Numerical experiments, including toy examples and two empirical illustrations (Pennsylvania reemployment bonuses and an entrepreneurship program), demonstrate the practical trade-offs and guide parameter tuning. Overall, the paper provides a principled, generalizable approach to incorporating broad fairness notions into distributional policy targets in observational settings.

Abstract

A decision maker typically (i) incorporates training data to learn about the relative effectiveness of treatments, and (ii) chooses an implementation mechanism that implies an ``optimal'' predicted outcome distribution according to some target functional. Nevertheless, a fairness-aware decision maker may not be satisfied achieving said optimality at the cost of being ``unfair" against a subgroup of the population, in the sense that the outcome distribution in that subgroup deviates too strongly from the overall optimal outcome distribution. We study a framework that allows the decision maker to regularize such deviations, while allowing for a wide range of target functionals and fairness measures to be employed. We establish regret and consistency guarantees for empirical success policies with (possibly) data-driven preference parameters, and provide numerical results. Furthermore, we briefly illustrate the methods in two empirical settings.
Paper Structure (40 sections, 20 theorems, 255 equations, 9 figures)

This paper contains 40 sections, 20 theorems, 255 equations, 9 figures.

Key Result

Proposition 2.1

The following statements hold for $\mathcal{F} \sqsubset \mathscr{D}$ (cf. Assumption as:MAIN):

Figures (9)

  • Figure 1: The figure plots the cdf $\langle \bm{\delta}, \mathcal{F}\rangle$ (left panel) and the cdf $\langle \bm{\delta}, \mathcal{F}\rangle_0$ (right panel) for different values of $\delta \in \{0, \frac{1}{4}, \frac{1}{2},\frac{3}{4}, 1\}$ and for $p = 3/4$. Lower values of $\delta$ lead to stochastically larger cdfs. Note that $\langle \bm{\delta}, \mathcal{F}\rangle_1$ can also be read off from the right panel due to symmetry, but the dependence on $\delta$ is now "reverted", cf. Equation \ref{['eqn:subpcdfs']}.
  • Figure 2: Objective function $\Omega_{\lambda, \mathcal{F}}(\bm{\delta})$ for $p = 3/4$ (left panel), and the corresponding maximum value of the objective function for $p=3/4$ in dependence on $\lambda$ (right panel).
  • Figure 3: Figures (rows 1, 2, and 3 are for differing sample sizes $n = 100, 1000, 10000$, respectively, columns for different assignment mechanisms A1 (left column) and A2 (right column), respectively) showing the inferred probability (y-axis) to assign Treatment 1 over all 100 replications, in dependence on the preference parameter $\lambda$ (x-axis), with linear interpolation in between the points $\{0, 1/49, 2/49, \hdots, 1\}$, at which the policy was actually estimated. The gray-scale coloring of the curves is chosen according to the depth of the curves, darker shades of gray corresponding to a stronger degree of typicality/centrality of the curve; see text for more explanation. The 10% of most central curves are highlighted in blue color, whereas the true argmax in dependence on $\lambda$ is highlighted in red color. The vertical dashed line intersects the abscissa at $c(p) \approx 0.123$.
  • Figure 4: Figures (rows 1, 2, and 3 are for differing sample sizes $n = 100, 1000, 10000$, respectively, columns for different assignment mechanisms A1 (left column) and A2 (right column), respectively) showing the empirical value function (y-axis) over all 100 replications, in dependence on the preference parameter $\lambda$ (x-axis), with linear interpolation in between the points $\{0, 1/49, 2/49, \hdots, 1\}$, at which the policy was actually estimated. The gray-scale coloring of the curves is chosen according to the depth of the curves, darker shades of gray corresponding to a stronger degree of typicality/centrality of the curve; see text for more explanation. The 10% of most central curves are highlighted in blue color, whereas the true value function in dependence on $\lambda$ is highlighted in red color. The vertical dashed line intersects the abscissa at $c(p) \approx 0.123$.
  • Figure 5: Average regret for sample sizes $n = 100, 1000, 10000$, respectively, assignment mechanisms A1 (left column) and A2 (right column), in dependence on the preference parameter $\lambda$ (x-axis), with linear interpolation in between the points $\{0, 1/49, 2/49, \hdots, 1\}$, at which the policy was actually estimated. The vertical dashed line intersects the abscissa at $c(p) \approx 0.123$.
  • ...and 4 more figures

Theorems & Definitions (54)

  • Remark 2.1
  • Remark 2.2
  • Remark 2.3
  • Remark 2.4
  • Remark 2.5
  • Proposition 2.1
  • Remark 3.1: Data-driven preference parameters
  • Theorem 3.1
  • Theorem 3.2
  • Proposition 3.3
  • ...and 44 more