Table of Contents
Fetching ...

Learning to be Fair: A Consequentialist Approach to Equitable Decision-Making

Alex Chohlas-Wood, Madison Coots, Henry Zhu, Emma Brunskill, Sharad Goel

TL;DR

The paper critiques axiomatic fairness in predictive decision-making and proposes a consequentialist fairness framework that foregrounds downstream outcomes. It develops a policy-optimization approach that elicits stakeholder preferences over outcomes and budgets, then computes utility-maximizing policies by solving a linear program, with extensions to online learning via contextual bandits under budgets. The authors provide sample-complexity bounds for learning under tabular and linear reward models and demonstrate an adaptive learning strategy (epsilon-greedy, Thompson sampling, UCB) through simulations inspired by a rideshare-to-court program, showing improved utility and reduced spending disparities. This work offers a principled, data-driven method to balance efficiency and equity in resource-constrained settings and demonstrates practical tools for policymakers to operationalize context-sensitive equity.

Abstract

In an attempt to make algorithms fair, the machine learning literature has largely focused on equalizing decisions, outcomes, or error rates across race or gender groups. To illustrate, consider a hypothetical government rideshare program that provides transportation assistance to low-income people with upcoming court dates. Following this literature, one might allocate rides to those with the highest estimated treatment effect per dollar, while constraining spending to be equal across race groups. That approach, however, ignores the downstream consequences of such constraints, and, as a result, can induce unexpected harms. For instance, if one demographic group lives farther from court, enforcing equal spending would necessarily mean fewer total rides provided, and potentially more people penalized for missing court. Here we present an alternative framework for designing equitable algorithms that foregrounds the consequences of decisions. In our approach, one first elicits stakeholder preferences over the space of possible decisions and the resulting outcomes--such as preferences for balancing spending parity against court appearance rates. We then optimize over the space of decision policies, making trade-offs in a way that maximizes the elicited utility. To do so, we develop an algorithm for efficiently learning these optimal policies from data for a large family of expressive utility functions. In particular, we use a contextual bandit algorithm to explore the space of policies while solving a convex optimization problem at each step to estimate the best policy based on the available information. This consequentialist paradigm facilitates a more holistic approach to equitable decision-making.

Learning to be Fair: A Consequentialist Approach to Equitable Decision-Making

TL;DR

The paper critiques axiomatic fairness in predictive decision-making and proposes a consequentialist fairness framework that foregrounds downstream outcomes. It develops a policy-optimization approach that elicits stakeholder preferences over outcomes and budgets, then computes utility-maximizing policies by solving a linear program, with extensions to online learning via contextual bandits under budgets. The authors provide sample-complexity bounds for learning under tabular and linear reward models and demonstrate an adaptive learning strategy (epsilon-greedy, Thompson sampling, UCB) through simulations inspired by a rideshare-to-court program, showing improved utility and reduced spending disparities. This work offers a principled, data-driven method to balance efficiency and equity in resource-constrained settings and demonstrates practical tools for policymakers to operationalize context-sensitive equity.

Abstract

In an attempt to make algorithms fair, the machine learning literature has largely focused on equalizing decisions, outcomes, or error rates across race or gender groups. To illustrate, consider a hypothetical government rideshare program that provides transportation assistance to low-income people with upcoming court dates. Following this literature, one might allocate rides to those with the highest estimated treatment effect per dollar, while constraining spending to be equal across race groups. That approach, however, ignores the downstream consequences of such constraints, and, as a result, can induce unexpected harms. For instance, if one demographic group lives farther from court, enforcing equal spending would necessarily mean fewer total rides provided, and potentially more people penalized for missing court. Here we present an alternative framework for designing equitable algorithms that foregrounds the consequences of decisions. In our approach, one first elicits stakeholder preferences over the space of possible decisions and the resulting outcomes--such as preferences for balancing spending parity against court appearance rates. We then optimize over the space of decision policies, making trade-offs in a way that maximizes the elicited utility. To do so, we develop an algorithm for efficiently learning these optimal policies from data for a large family of expressive utility functions. In particular, we use a contextual bandit algorithm to explore the space of policies while solving a convex optimization problem at each step to estimate the best policy based on the available information. This consequentialist paradigm facilitates a more holistic approach to equitable decision-making.

Paper Structure

This paper contains 21 sections, 11 theorems, 74 equations, 9 figures, 3 tables, 1 algorithm.

Key Result

Lemma 1

The loss of utility due to using $\hat{\pi} = \mathop{\rm arg\,max}_{\pi} \hat{U}(\pi)$ is bounded by

Figures (9)

  • Figure 1: The map in \ref{['fig:map']} shows the geographic distribution of the client base of the Santa Clara County Public Defender Office. The star on the map marks the location of the main county courthouse, where most clients are required to appear for court appointments. The plot in \ref{['fig:spending_curve']} explores the consequence of following a policy that provides rides to those with the highest estimated treatment effect per dollar without parity constraints. This policy would result in higher average per-person spending for white individuals than for Vietnamese individuals. The red point shows that a hypothetical annual ride budget of $50,000 would result in an average per-person spending amount of $6.86 for white individuals and an average per-person spending amount of $4.54 for Vietnamese individuals.
  • Figure 2: The Pareto frontier for a stylized population model, showing the trade-off between appearances and spending per Black client. The vertical axis shows expected additional appearances relative to a policy that does not provide rideshare assistance to any clients. Under this model, common heuristics (e.g. maximizing appearances, and demanding demographic or error-rate parity) lead to sub-optimal policies.
  • Figure 3: The graphic in \ref{['fig:survey-graphic']} was shown to survey participants to help them select their preferred ride allocation policy. In this hypothetical scenario, option B maximizes appearances, while option C corresponds to spending parity. The survey results in \ref{['fig:survey_results']} show that both Democrats and Republicans prefer policies that spend roughly equal amounts on Black and white clients, but there is a wide range of preferences among members of both groups.
  • Figure 4: Mean regret, across 2,000 simulations, incurred by different learning approaches. We define regret here as the difference between the observed utility and the utility obtained by an oracle during the same experiment. Values are tightly estimated at each $i$, with the 95% confidence interval no more than 1.1 units off the estimate, so we omit uncertainty bands for this figure. We note that the three bandit approaches---$\varepsilon$-greedy, Thompson sampling, and UCB---incur substantially less regret than random assignment (RA). It is possible to reduce the regret incurred from RA by stopping randomization early, and following the optimal estimated policy from that point forward. However, these stop-early RA approaches produce worse policies than other approaches (Figure \ref{['fig:pct_of_optimal']}).
  • Figure 5: Mean performance, across 2,000 simulations, of optimal policies estimated with data available at each iteration $i$. Performance is defined as the additional utility obtained by a policy over a baseline of no treatment for all individuals, with 100% indicating this quantity for the oracle policy. Uncertainty bands represent 95% intervals around the mean. UCB and Thompson sampling generate policies that are better than random assignment (RA) at any given iteration $i$. In contrast, the $\varepsilon$-greedy approach and the stop-early versions of RA generate policies that are slower to (or may never) reach near-oracle performance.
  • ...and 4 more figures

Theorems & Definitions (20)

  • Lemma 1
  • proof
  • Theorem 1: Tabular Rewards
  • Theorem 2: Linear Rewards
  • Theorem 3
  • proof
  • Proposition 1
  • proof
  • Theorem 4: Restatement of Theorem \ref{['thm:rct_tabular']}
  • proof
  • ...and 10 more