Table of Contents
Fetching ...

Partial Identification Approach to Counterfactual Fairness Assessment

Saeyoung Rho, Junzhe Zhang, Elias Bareinboim

TL;DR

The paper tackles non-identifiability in counterfactual fairness by introducing a Bayesian partial identification framework that bounds the counterfactual fairness measure $\mu$ from observational data. It combines causal discovery via Fast Causal Inference (FCI) with a Gibbs-based sampler (SampleCTF) to produce a $(1-\delta)$-level bound across candidate graphs. Theoretical results show that discrete exogenous variables with bounded cardinalities suffice to represent both the observational distribution and nested counterfactuals, enabling valid bounds without strong parametric assumptions; the authors validate the approach with simulations and apply it to COMPAS, revealing a substantial spurious effect of race (~25%) and a negative direct age effect on the score. The work provides a practical fairness auditing tool for opaque AI systems by delivering informative, data-driven bounds when full identifiability cannot be achieved.

Abstract

The wide adoption of AI decision-making systems in critical domains such as criminal justice, loan approval, and hiring processes has heightened concerns about algorithmic fairness. As we often only have access to the output of algorithms without insights into their internal mechanisms, it was natural to examine how decisions would alter when auxiliary sensitive attributes (such as race) change. This led the research community to come up with counterfactual fairness measures, but how to evaluate the measure from available data remains a challenging task. In many practical applications, the target counterfactual measure is not identifiable, i.e., it cannot be uniquely determined from the combination of quantitative data and qualitative knowledge. This paper addresses this challenge using partial identification, which derives informative bounds over counterfactual fairness measures from observational data. We introduce a Bayesian approach to bound unknown counterfactual fairness measures with high confidence. We demonstrate our algorithm on the COMPAS dataset, examining fairness in recidivism risk scores with respect to race, age, and sex. Our results reveal a positive (spurious) effect on the COMPAS score when changing race to African-American (from all others) and a negative (direct causal) effect when transitioning from young to old age.

Partial Identification Approach to Counterfactual Fairness Assessment

TL;DR

The paper tackles non-identifiability in counterfactual fairness by introducing a Bayesian partial identification framework that bounds the counterfactual fairness measure from observational data. It combines causal discovery via Fast Causal Inference (FCI) with a Gibbs-based sampler (SampleCTF) to produce a -level bound across candidate graphs. Theoretical results show that discrete exogenous variables with bounded cardinalities suffice to represent both the observational distribution and nested counterfactuals, enabling valid bounds without strong parametric assumptions; the authors validate the approach with simulations and apply it to COMPAS, revealing a substantial spurious effect of race (~25%) and a negative direct age effect on the score. The work provides a practical fairness auditing tool for opaque AI systems by delivering informative, data-driven bounds when full identifiability cannot be achieved.

Abstract

The wide adoption of AI decision-making systems in critical domains such as criminal justice, loan approval, and hiring processes has heightened concerns about algorithmic fairness. As we often only have access to the output of algorithms without insights into their internal mechanisms, it was natural to examine how decisions would alter when auxiliary sensitive attributes (such as race) change. This led the research community to come up with counterfactual fairness measures, but how to evaluate the measure from available data remains a challenging task. In many practical applications, the target counterfactual measure is not identifiable, i.e., it cannot be uniquely determined from the combination of quantitative data and qualitative knowledge. This paper addresses this challenge using partial identification, which derives informative bounds over counterfactual fairness measures from observational data. We introduce a Bayesian approach to bound unknown counterfactual fairness measures with high confidence. We demonstrate our algorithm on the COMPAS dataset, examining fairness in recidivism risk scores with respect to race, age, and sex. Our results reveal a positive (spurious) effect on the COMPAS score when changing race to African-American (from all others) and a negative (direct causal) effect when transitioning from young to old age.

Paper Structure

This paper contains 35 sections, 2 theorems, 5 equations, 16 figures, 2 tables, 2 algorithms.

Key Result

Theorem 3.2

For an SCM $\mathcal{M}$, let $\mathcal{G}$ be its associated causal graph, $P(\boldsymbol{V})$ be its observational distribution, and $\mu$ be a nested counterfactual measure. Then there exists an alternative SCM $\mathcal{N}$ with exogenous cardinalities bounded in Eq. eq:u_bound such that $\mathc

Figures (16)

  • Figure 1: Causal diagrams representing a standard fairness model containing a protected attribute $A$ (e.g., race), an outcome $Y$ (recidivism score), a confounder $Z$ (birthplace) and a mediator $W$ (prior criminal records).
  • Figure 2: Histograms for DE, IE, and SE, obtained from the simulation dataset. The black vertical line is the ground-truth value (labeled as TRUE) and the two red lines show 95% confidence interval (2.5% top, 2.5% bottom).
  • Figure 3: Inferred causal diagram for the COMPAS dataset using the FCI algorithm and domain knowledge.
  • Figure 4: Graphical model for each protected variable with two exogenous variables. Each exogenous variables have 17 states. $A$ denotes (a) race, (b) age, (c) sex, $W_1$ denotes charge degree, $W_2$ denotes prior counts, and $Y$ denotes the predicted COMPAS score.
  • Figure 5: Histogram of SE when $A=\hbox{Race}$ (left), DE when $A=\hbox{Age}$ (middle), IE when $A=\hbox{Sex}$ (right). The two red lines show 95% confidence interval (2.5% top, 2.5% bottom).
  • ...and 11 more figures

Theorems & Definitions (7)

  • Definition 2.1: Counterfactual Effect
  • Definition 2.2: Direct, Indirect, Spurious Effects
  • Definition 3.1: C-Component tian:pea02
  • Theorem 3.2
  • proof
  • Theorem 3.3
  • proof