Targeted Learning for Data Fairness
Alexander Asemota, Giles Hooker
TL;DR
The paper treats fairness as a data-generating-process problem (data fairness) rather than solely a model- or algorithm-centered issue, and proposes Targeted Learning (TL) as a flexible, nonparametric framework to perform statistical inference on fairness. It derives efficient influence-function-based estimators for traditional and probabilistic demographic parity and equal opportunity, as well as for conditional mutual information (CMI), with double robustness properties for the probabilistic metrics. Through simulations and real-data analyses (Adult-Income and Law School), the authors demonstrate TL's ability to produce valid inference under model misspecification, reveal data-level disparities, and quantify variable-importance in fairness metrics. The work highlights the potential and challenges of data-fairness inference, discusses connections to causal-inference concepts, and points to future directions for extending metrics, inference methods, and remediation strategies in fairness-critical decisions.
Abstract
Data and algorithms have the potential to produce and perpetuate discrimination and disparate treatment. As such, significant effort has been invested in developing approaches to defining, detecting, and eliminating unfair outcomes in algorithms. In this paper, we focus on performing statistical inference for fairness. Prior work in fairness inference has largely focused on inferring the fairness properties of a given predictive algorithm. Here, we expand fairness inference by evaluating fairness in the data generating process itself, referred to here as data fairness. We perform inference on data fairness using targeted learning, a flexible framework for nonparametric inference. We derive estimators demographic parity, equal opportunity, and conditional mutual information. Additionally, we find that our estimators for probabilistic metrics exploit double robustness. To validate our approach, we perform several simulations and apply our estimators to real data.
