Regularizing Fairness in Optimal Policy Learning with Distributional Targets

Anders Bredahl Kock; David Preinerstorfer

Regularizing Fairness in Optimal Policy Learning with Distributional Targets

Anders Bredahl Kock, David Preinerstorfer

TL;DR

This work introduces a flexible framework for fair optimal policy learning when the target is a distributional welfare functional rather than the mean. By formulating a penalized objective $\Omega_{\lambda,\mathcal{F}}(\bm{\delta}) = (1-\lambda)\mathsf{T}(\langle\bm{\delta},\mathcal{F}\rangle) - \lambda \max_{z}\mathsf{S}(\langle\bm{\delta},\mathcal{F}\rangle_z,\langle\bm{\delta},\mathcal{F}\rangle)$, the DM can trade off efficiency and fairness across protected groups, with $\lambda$ selected via data-driven strategies or budgeted fairness. The authors prove regret bounds and consistency for empirical success policies, and extend the methodology to non-discrete covariates, including an interpolation-based method for value function estimation. Numerical experiments, including toy examples and two empirical illustrations (Pennsylvania reemployment bonuses and an entrepreneurship program), demonstrate the practical trade-offs and guide parameter tuning. Overall, the paper provides a principled, generalizable approach to incorporating broad fairness notions into distributional policy targets in observational settings.

Abstract

A decision maker typically (i) incorporates training data to learn about the relative effectiveness of treatments, and (ii) chooses an implementation mechanism that implies an ``optimal'' predicted outcome distribution according to some target functional. Nevertheless, a fairness-aware decision maker may not be satisfied achieving said optimality at the cost of being ``unfair" against a subgroup of the population, in the sense that the outcome distribution in that subgroup deviates too strongly from the overall optimal outcome distribution. We study a framework that allows the decision maker to regularize such deviations, while allowing for a wide range of target functionals and fairness measures to be employed. We establish regret and consistency guarantees for empirical success policies with (possibly) data-driven preference parameters, and provide numerical results. Furthermore, we briefly illustrate the methods in two empirical settings.

Regularizing Fairness in Optimal Policy Learning with Distributional Targets

TL;DR

This work introduces a flexible framework for fair optimal policy learning when the target is a distributional welfare functional rather than the mean. By formulating a penalized objective

, the DM can trade off efficiency and fairness across protected groups, with

selected via data-driven strategies or budgeted fairness. The authors prove regret bounds and consistency for empirical success policies, and extend the methodology to non-discrete covariates, including an interpolation-based method for value function estimation. Numerical experiments, including toy examples and two empirical illustrations (Pennsylvania reemployment bonuses and an entrepreneurship program), demonstrate the practical trade-offs and guide parameter tuning. Overall, the paper provides a principled, generalizable approach to incorporating broad fairness notions into distributional policy targets in observational settings.

Abstract

Paper Structure (40 sections, 20 theorems, 255 equations, 9 figures)

This paper contains 40 sections, 20 theorems, 255 equations, 9 figures.

Introduction
Informal summary of our methods and results, and practical guidelines
Objective
Policies
Practical guideline
Setting
Observational structure and two assumptions
Decision rules and policies
Goal of the DM
Some notation
Distributions generated by rolling out a decision rule
Objective: Target functional penalized for unfairness
Main technical assumption concerning the functionals $\mathsf{T}$ and $\mathsf{S}$ and some preliminary observations
Some consequences
Regret
...and 25 more sections

Key Result

Proposition 2.1

The following statements hold for $\mathcal{F} \sqsubset \mathscr{D}$ (cf. Assumption as:MAIN):

Figures (9)

Figure 1: The figure plots the cdf $\langle \bm{\delta}, \mathcal{F}\rangle$ (left panel) and the cdf $\langle \bm{\delta}, \mathcal{F}\rangle_0$ (right panel) for different values of $\delta \in \{0, \frac{1}{4}, \frac{1}{2},\frac{3}{4}, 1\}$ and for $p = 3/4$. Lower values of $\delta$ lead to stochastically larger cdfs. Note that $\langle \bm{\delta}, \mathcal{F}\rangle_1$ can also be read off from the right panel due to symmetry, but the dependence on $\delta$ is now "reverted", cf. Equation \ref{['eqn:subpcdfs']}.
Figure 2: Objective function $\Omega_{\lambda, \mathcal{F}}(\bm{\delta})$ for $p = 3/4$ (left panel), and the corresponding maximum value of the objective function for $p=3/4$ in dependence on $\lambda$ (right panel).
Figure 3: Figures (rows 1, 2, and 3 are for differing sample sizes $n = 100, 1000, 10000$, respectively, columns for different assignment mechanisms A1 (left column) and A2 (right column), respectively) showing the inferred probability (y-axis) to assign Treatment 1 over all 100 replications, in dependence on the preference parameter $\lambda$ (x-axis), with linear interpolation in between the points $\{0, 1/49, 2/49, \hdots, 1\}$, at which the policy was actually estimated. The gray-scale coloring of the curves is chosen according to the depth of the curves, darker shades of gray corresponding to a stronger degree of typicality/centrality of the curve; see text for more explanation. The 10% of most central curves are highlighted in blue color, whereas the true argmax in dependence on $\lambda$ is highlighted in red color. The vertical dashed line intersects the abscissa at $c(p) \approx 0.123$.
Figure 4: Figures (rows 1, 2, and 3 are for differing sample sizes $n = 100, 1000, 10000$, respectively, columns for different assignment mechanisms A1 (left column) and A2 (right column), respectively) showing the empirical value function (y-axis) over all 100 replications, in dependence on the preference parameter $\lambda$ (x-axis), with linear interpolation in between the points $\{0, 1/49, 2/49, \hdots, 1\}$, at which the policy was actually estimated. The gray-scale coloring of the curves is chosen according to the depth of the curves, darker shades of gray corresponding to a stronger degree of typicality/centrality of the curve; see text for more explanation. The 10% of most central curves are highlighted in blue color, whereas the true value function in dependence on $\lambda$ is highlighted in red color. The vertical dashed line intersects the abscissa at $c(p) \approx 0.123$.
Figure 5: Average regret for sample sizes $n = 100, 1000, 10000$, respectively, assignment mechanisms A1 (left column) and A2 (right column), in dependence on the preference parameter $\lambda$ (x-axis), with linear interpolation in between the points $\{0, 1/49, 2/49, \hdots, 1\}$, at which the policy was actually estimated. The vertical dashed line intersects the abscissa at $c(p) \approx 0.123$.
...and 4 more figures

Theorems & Definitions (54)

Remark 2.1
Remark 2.2
Remark 2.3
Remark 2.4
Remark 2.5
Proposition 2.1
Remark 3.1: Data-driven preference parameters
Theorem 3.1
Theorem 3.2
Proposition 3.3
...and 44 more

Regularizing Fairness in Optimal Policy Learning with Distributional Targets

TL;DR

Abstract

Regularizing Fairness in Optimal Policy Learning with Distributional Targets

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (9)

Theorems & Definitions (54)