Table of Contents
Fetching ...

What Constitutes a Less Discriminatory Algorithm?

Benjamin Laufer, Manish Raghavan, Solon Barocas

TL;DR

The paper addresses how to define and locate less discriminatory algorithms (LDAs) under disparate impact law, arguing that purely quantitative definitions fail without held-out data and proposing a reasonableness standard based on projected performance. It formalizes LDAs as a comparison between a baseline $h^0$ and a candidate $h'$, analyzes the mathematical structure of achievable accuracy-disparity trade-offs via a feasible set (a polygon in the utility-disparity plane), and proves that finding the exact least-discriminatory algorithm is NP-hard while providing approximation approaches. It also demonstrates, through empirical studies on the Adult and German Credit datasets, that simple randomized search procedures can yield out-of-sample disparity reductions with little or no loss in utility in some settings. Overall, the work bridges legal concepts with algorithmic design by offering a rigorous framework for searching for LDAs and clarifying the practical limits and strategies for such searches.

Abstract

Disparate impact doctrine offers an important legal apparatus for targeting discriminatory data-driven algorithmic decisions. A recent body of work has focused on conceptualizing one particular construct from this doctrine: the less discriminatory alternative, an alternative policy that reduces disparities while meeting the same business needs of a status quo or baseline policy. However, attempts to operationalize this construct in the algorithmic setting must grapple with some thorny challenges and ambiguities. In this paper, we attempt to raise and resolve important questions about less discriminatory algorithms (LDAs). How should we formally define LDAs, and how does this interact with different societal goals they might serve? And how feasible is it for firms or plaintiffs to computationally search for candidate LDAs? We find that formal LDA definitions face fundamental challenges when they attempt to evaluate and compare predictive models in the absence of held-out data. As a result, we argue that LDA definitions cannot be purely quantitative, and must rely on standards of "reasonableness." We then identify both mathematical and computational constraints on firms' ability to efficiently conduct a proactive search for LDAs, but we provide evidence that these limits are "weak" in a formal sense. By defining LDAs formally, we put forward a framework in which both firms and plaintiffs can search for alternative models that comport with societal goals.

What Constitutes a Less Discriminatory Algorithm?

TL;DR

The paper addresses how to define and locate less discriminatory algorithms (LDAs) under disparate impact law, arguing that purely quantitative definitions fail without held-out data and proposing a reasonableness standard based on projected performance. It formalizes LDAs as a comparison between a baseline and a candidate , analyzes the mathematical structure of achievable accuracy-disparity trade-offs via a feasible set (a polygon in the utility-disparity plane), and proves that finding the exact least-discriminatory algorithm is NP-hard while providing approximation approaches. It also demonstrates, through empirical studies on the Adult and German Credit datasets, that simple randomized search procedures can yield out-of-sample disparity reductions with little or no loss in utility in some settings. Overall, the work bridges legal concepts with algorithmic design by offering a rigorous framework for searching for LDAs and clarifying the practical limits and strategies for such searches.

Abstract

Disparate impact doctrine offers an important legal apparatus for targeting discriminatory data-driven algorithmic decisions. A recent body of work has focused on conceptualizing one particular construct from this doctrine: the less discriminatory alternative, an alternative policy that reduces disparities while meeting the same business needs of a status quo or baseline policy. However, attempts to operationalize this construct in the algorithmic setting must grapple with some thorny challenges and ambiguities. In this paper, we attempt to raise and resolve important questions about less discriminatory algorithms (LDAs). How should we formally define LDAs, and how does this interact with different societal goals they might serve? And how feasible is it for firms or plaintiffs to computationally search for candidate LDAs? We find that formal LDA definitions face fundamental challenges when they attempt to evaluate and compare predictive models in the absence of held-out data. As a result, we argue that LDA definitions cannot be purely quantitative, and must rely on standards of "reasonableness." We then identify both mathematical and computational constraints on firms' ability to efficiently conduct a proactive search for LDAs, but we provide evidence that these limits are "weak" in a formal sense. By defining LDAs formally, we put forward a framework in which both firms and plaintiffs can search for alternative models that comport with societal goals.

Paper Structure

This paper contains 21 sections, 4 theorems, 4 equations, 4 figures, 3 tables.

Key Result

Proposition 1

Under the assumption that no two applicants are identical, we can find a decision rule that simultaneously achieves perfect accuracy on one dataset and zero disparity on another.

Figures (4)

  • Figure 1: Consider a given, finite population broken down by group belonging and outcomes (a; left). If randomized decision rules are feasible, then a polygon depicts the convex set of feasible, in-sample utility and disparity values (b; center). If solutions are restricted to deterministic classifiers over the dataset (c; right), the polygon bounds the achievable values.
  • Figure 2: Results from a simple randomized search for a less discriminatory algorithm on the Adult dataset. The search procedure randomly samples with replacement from the training dataset and retrains a Random Forest classifier $n$ times, for $n \in \{2,...,100\}$. From the $n$ candidate models, it selects the minimum-disparity classifier, as measured using separate data from an evaluation dataset. As $n$ increases, we find disparity decreases, on average, on held out (out-of-sample) test data (a; left). In this case, we also find that utility does not diminish from this procedure (b; center). Confidence intervals are produced by repeating the procedure 2000 times. As the number of random draws increases, the probability of having selected the lowest-disparity model out-of-sample decreases (c; right) suggesting that conducting an effective search might require passing over models that end up having lower disparity.
  • Figure 3: Empirically observed achievable polygon using the $\texttt{Adult}$ and $\texttt{German Credit}$ evaluation datasets. In the case of Adult (left), a Random Forest classifier exhibits wide selection rate disparities. Randomly sampled alternatives exhibit small variations in utility and disparity. Analysis of performance on held out test data suggests that selecting the best alternative can statistically improve utility and reduce disparity. In the case of German Credit (right), an initial classifier does not exhibit wide disparities, leaving a narrow region for LDA improvement. No statistically significant disparity reduction is observed on held out test data.
  • Figure 4: Runtime (left, center) and accuracy performance (right) of exact and approximate algorithms for computing the least discriminatory alternative classifier, given oracle access to information about the population distribution. The exact algorithm is compared to three different $(1+\epsilon)$ approximation algorithms with varying $\epsilon$. A total of $250$ instances of the LDA problem were randomly generated, with varying numbers of data types, starting classifiers, and numbers of digits used to specify probability densities. For every generated instance, all four algorithms were run and runtime and accuracy were recorded. As the size of the input increases (measured as the total number of digits beyond the decimal places used to specify $\sigma(x)$ and $\rho_g(x)$), the worst-case runtime complexity for the exact algorithm explodes. However, polynomial-time approximations achieve a high hit rate, defined as the rate at which an LDA is identified successfully.

Theorems & Definitions (19)

  • Proposition 1
  • Definition 1
  • Theorem 1
  • Definition 2: Full-information LDA
  • Theorem 2
  • proof
  • proof
  • Definition 3: Subset sum problem
  • Lemma 1
  • Claim 1
  • ...and 9 more