Table of Contents
Fetching ...

Conformal Prediction Sets Can Cause Disparate Impact

Jesse C. Cresswell, Bhargava Kumar, Yi Sui, Mouloud Belbahri

TL;DR

Conformal prediction provides prediction sets with coverage $P[y \in \mathcal{C}(x)] \ge 1-\alpha$, but deploying these sets in human-in-the-loop decisions can yield disparate impact measured by $\Delta_t = \max_{a,b \in \mathcal{G}} (\delta_{t,a}-\delta_{t,b})$. Through pre-registered randomized trials across three tasks, the paper shows that Equalized Coverage (Mondrian CP) often increases disparity relative to marginal CP, and that focusing on Equalized Set Size or Equalized Singleton Frequency better correlates with reduced unfairness. The authors analyze factors such as coverage, adoption, set size, and singleton frequency to explain why set-based fairness diverges from coverage-based fairness. The work provides practical guidance for deploying CP in real-world, human-in-the-loop settings and highlights that fairness metrics should emphasize outcome-oriented balance rather than purely coverage parity.

Abstract

Conformal prediction is a statistically rigorous method for quantifying uncertainty in models by having them output sets of predictions, with larger sets indicating more uncertainty. However, prediction sets are not inherently actionable; many applications require a single output to act on, not several. To overcome this limitation, prediction sets can be provided to a human who then makes an informed decision. In any such system it is crucial to ensure the fairness of outcomes across protected groups, and researchers have proposed that Equalized Coverage be used as the standard for fairness. By conducting experiments with human participants, we demonstrate that providing prediction sets can lead to disparate impact in decisions. Disquietingly, we find that providing sets that satisfy Equalized Coverage actually increases disparate impact compared to marginal coverage. Instead of equalizing coverage, we propose to equalize set sizes across groups which empirically leads to lower disparate impact.

Conformal Prediction Sets Can Cause Disparate Impact

TL;DR

Conformal prediction provides prediction sets with coverage , but deploying these sets in human-in-the-loop decisions can yield disparate impact measured by . Through pre-registered randomized trials across three tasks, the paper shows that Equalized Coverage (Mondrian CP) often increases disparity relative to marginal CP, and that focusing on Equalized Set Size or Equalized Singleton Frequency better correlates with reduced unfairness. The authors analyze factors such as coverage, adoption, set size, and singleton frequency to explain why set-based fairness diverges from coverage-based fairness. The work provides practical guidance for deploying CP in real-world, human-in-the-loop settings and highlights that fairness metrics should emphasize outcome-oriented balance rather than purely coverage parity.

Abstract

Conformal prediction is a statistically rigorous method for quantifying uncertainty in models by having them output sets of predictions, with larger sets indicating more uncertainty. However, prediction sets are not inherently actionable; many applications require a single output to act on, not several. To overcome this limitation, prediction sets can be provided to a human who then makes an informed decision. In any such system it is crucial to ensure the fairness of outcomes across protected groups, and researchers have proposed that Equalized Coverage be used as the standard for fairness. By conducting experiments with human participants, we demonstrate that providing prediction sets can lead to disparate impact in decisions. Disquietingly, we find that providing sets that satisfy Equalized Coverage actually increases disparate impact compared to marginal coverage. Instead of equalizing coverage, we propose to equalize set sizes across groups which empirically leads to lower disparate impact.
Paper Structure (24 sections, 13 equations, 8 figures, 6 tables, 1 algorithm)

This paper contains 24 sections, 13 equations, 8 figures, 6 tables, 1 algorithm.

Figures (8)

  • Figure 1: We measure the increase in accuracy per-group from using prediction sets compared to the control population without model assistance. Disparate impact is the maximum difference of increases between groups, and should be minimized for fairness (\ref{['eq:disparate_impact']}). Prediction sets do not benefit all groups equally, while sets with Equalized Coverage (Conditional) lead to the most unfair outcomes. Statistical analyses and significance are presented in \ref{['sec:results']}.
  • Figure 2: Illustration of how unfairness can arise in CP. Given a data distribution $\mathbb{P}$ with groups of differing difficulty, a model $f$ may have inherent bias. Using marginal CP can translate to lower coverage and larger sets for the harder group. To equalize coverage, conditional CP must increase set sizes on the harder class, and reduce them on the easier class. Since human accuracy correlates strongly with set size, not coverage, outcomes become more unfair with Equalized Coverage.
  • Figure 3: Main trial screen shown to participants for FACET with marginal conformal set treatment. The correct answer is given only after the participant responds.
  • Figure 4: Accuracy disparate impact $\Delta_t$ compared to the difference between the most and least improved groups across key factors for various datasets and treatments. From left to right: coverage, adoption, average set size, and singleton frequency.
  • Figure 5: Accuracy by group for FACET. Left: Human accuracy (Control) and model Top-1 Accuracy across groups. Right: Human accuracy across groups and treatments.
  • ...and 3 more figures