Table of Contents
Fetching ...

The Legal Duty to Search for Less Discriminatory Algorithms

Emily Black, Logan Koepke, Pauline Kim, Solon Barocas, Mingwei Hsu

TL;DR

The paper argues that many prediction tasks admit multiple equally accurate models, a phenomenon termed predictive multiplicity, which can yield different disparate impacts. LDAs are defined as equally performant but less discriminatory models, identified using an $\epsilon$-based equivalence. It links predictive multiplicity to disparate impact doctrine and proposes a duty to search for LDAs within civil-rights regimes, potentially via regulatory guidance. The Upstart Monitorship provides a practical demonstration that targeted searches can uncover LDAs, though adoption decisions reveal implementation tensions. The authors offer concrete steps for implementing the duty to search, discuss costs and caveats, and outline governance pathways to advance algorithmic fairness in regulatory practice.

Abstract

Work in computer science has established that, contrary to conventional wisdom, for a given prediction problem there are almost always multiple possible models with equivalent performance--a phenomenon often termed model multiplicity. Critically, different models of equivalent performance can produce different predictions for the same individual, and, in aggregate, exhibit different levels of impacts across demographic groups. Thus, when an algorithmic system displays a disparate impact, model multiplicity suggests that developers could discover an alternative model that performs equally well, but has less discriminatory impact. Indeed, the promise of model multiplicity is that an equally accurate, but less discriminatory algorithm (LDA) almost always exists. But without dedicated exploration, it is unlikely developers will discover potential LDAs. Model multiplicity and the availability of LDAs have significant ramifications for the legal response to discriminatory algorithms, in particular for disparate impact doctrine, which has long taken into account the availability of alternatives with less disparate effect when assessing liability. A close reading of legal authorities over the decades reveals that the law has on numerous occasions recognized that the existence of a less discriminatory alternative is sometimes relevant to a defendant's burden of justification at the second step of disparate impact analysis. Indeed, under disparate impact doctrine, it makes little sense to say that a given algorithmic system used by an employer, creditor, or housing provider is "necessary" if an equally accurate model that exhibits less disparate effect is available and possible to discover with reasonable effort. As a result, we argue that the law should place a duty of a reasonable search for LDAs on entities that develop and deploy predictive models in covered civil rights domains.

The Legal Duty to Search for Less Discriminatory Algorithms

TL;DR

The paper argues that many prediction tasks admit multiple equally accurate models, a phenomenon termed predictive multiplicity, which can yield different disparate impacts. LDAs are defined as equally performant but less discriminatory models, identified using an -based equivalence. It links predictive multiplicity to disparate impact doctrine and proposes a duty to search for LDAs within civil-rights regimes, potentially via regulatory guidance. The Upstart Monitorship provides a practical demonstration that targeted searches can uncover LDAs, though adoption decisions reveal implementation tensions. The authors offer concrete steps for implementing the duty to search, discuss costs and caveats, and outline governance pathways to advance algorithmic fairness in regulatory practice.

Abstract

Work in computer science has established that, contrary to conventional wisdom, for a given prediction problem there are almost always multiple possible models with equivalent performance--a phenomenon often termed model multiplicity. Critically, different models of equivalent performance can produce different predictions for the same individual, and, in aggregate, exhibit different levels of impacts across demographic groups. Thus, when an algorithmic system displays a disparate impact, model multiplicity suggests that developers could discover an alternative model that performs equally well, but has less discriminatory impact. Indeed, the promise of model multiplicity is that an equally accurate, but less discriminatory algorithm (LDA) almost always exists. But without dedicated exploration, it is unlikely developers will discover potential LDAs. Model multiplicity and the availability of LDAs have significant ramifications for the legal response to discriminatory algorithms, in particular for disparate impact doctrine, which has long taken into account the availability of alternatives with less disparate effect when assessing liability. A close reading of legal authorities over the decades reveals that the law has on numerous occasions recognized that the existence of a less discriminatory alternative is sometimes relevant to a defendant's burden of justification at the second step of disparate impact analysis. Indeed, under disparate impact doctrine, it makes little sense to say that a given algorithmic system used by an employer, creditor, or housing provider is "necessary" if an equally accurate model that exhibits less disparate effect is available and possible to discover with reasonable effort. As a result, we argue that the law should place a duty of a reasonable search for LDAs on entities that develop and deploy predictive models in covered civil rights domains.
Paper Structure (20 sections, 2 figures)

This paper contains 20 sections, 2 figures.

Figures (2)

  • Figure 1: Left: An illustration of how ten different models can exhibit the same accuracy while giving different individual predictions on a hypothetical group of ten people. Right: An example of two multiplicitous models: they display equal accuracy (80% over all people), yet make different individual predictions, leading to a difference in discriminatory behavior. The graph to the left has a steep difference in selection rate between men and women, whereas the graph to the right does not. The darker region of the graph refer to places where the model predicts an individual to be creditworthy, and darker points correspond to individuals who are indeed creditworthy. The lighter region of the graph refers to areas where the model predicts an individual to be uncreditworthy, and lighter points correspond to individuals who are indeed uncreditworthy. Triangular points refer to women, and square points refer to men.
  • Figure 2: A simplified view of the AI pipeline, its key stages, and instances of design choices made per stage.