Fairness Hacking: The Malicious Practice of Shrouding Unfairness in Algorithms

Kristof Meding; Thilo Hagendorff

Fairness Hacking: The Malicious Practice of Shrouding Unfairness in Algorithms

Kristof Meding, Thilo Hagendorff

TL;DR

This work analyzes how fairness metrics can be deliberately manipulated, formalizing two hacking modes: intra-metric (within a single metric) and inter-metric (across multiple metrics). It demonstrates these hacks with synthetic data and real MEPS data, showing how confidence intervals and metric choice can mask or exaggerate fairness. The authors advocate pre-registration, balanced datasets, comprehensive reporting (including effect sizes and confidence intervals), and sociotechnical contextualization to mitigate misuse. They highlight that fairness metrics embed normative assumptions and stress community-wide practices to reduce harm from biased AI systems.

Abstract

Fairness in machine learning (ML) is an ever-growing field of research due to the manifold potential for harm from algorithmic discrimination. To prevent such harm, a large body of literature develops new approaches to quantify fairness. Here, we investigate how one can divert the quantification of fairness by describing a practice we call "fairness hacking" for the purpose of shrouding unfairness in algorithms. This impacts end-users who rely on learning algorithms, as well as the broader community interested in fair AI practices. We introduce two different categories of fairness hacking in reference to the established concept of p-hacking. The first category, intra-metric fairness hacking, describes the misuse of a particular metric by adding or removing sensitive attributes from the analysis. In this context, countermeasures that have been developed to prevent or reduce p-hacking can be applied to similarly prevent or reduce fairness hacking. The second category of fairness hacking is inter-metric fairness hacking. Inter-metric fairness hacking is the search for a specific fair metric with given attributes. We argue that countermeasures to prevent or reduce inter-metric fairness hacking are still in their infancy. Finally, we demonstrate both types of fairness hacking using real datasets. Our paper intends to serve as a guidance for discussions within the fair ML community to prevent or reduce the misuse of fairness metrics, and thus reduce overall harm from ML applications.

Fairness Hacking: The Malicious Practice of Shrouding Unfairness in Algorithms

TL;DR

Abstract

Paper Structure (18 sections, 1 equation, 8 figures, 1 table)

This paper contains 18 sections, 1 equation, 8 figures, 1 table.

Introduction
Related work
P-hacking in the sciences
From p-hacking to fairness hacking
Methods
Results
Intra-metric fairness hacking as a variant of p-hacking
Inter-metric fairness hacking: inverse of the impossibility theorem
Fairness hacking in real-world datasets
Recommendations to avoid fairness hacking
Immediate Recommendations
Recommendations in the long run
Summary and outlook
Appendix
Figure 1 with equal opportunity and statistical parity
...and 3 more sections

Figures (8)

Figure 1: Intra-metric fairness hacking for error rate and statistical parity. Distribution (in percent) of the error rate (left) or statistical parity (center) difference between 1,000 randomly assigned attributes (with binary values) for the outcomes of our hypothetical ML algorithm. A value of zero corresponds to no bias against a group. Values to the left or to the right indicate a bias against groups. Upper horizontal lines indicate confidence intervals with different alpha levels. Right: Scatter plot of statistical parity against error rate. The numbers in the corners indicate which percentage of the data falls in the respective quadrant.
Figure 2: Inter-metric fairness hacking as the inverse of the impossible fairness theorem. a) Plotted is the difference between group 1 and group 0 of one binary attribute. The binary attribute is randomly distributed across groups. The hypothetical algorithm has 75% accuracy for both groups. Vertical lines indicate a Bonferroni-corrected confidence interval: alpha level = $\frac{0.05}{12}$. Figure a) has the same plot conventions as b), but the accuracy difference is set to 0.7.
Figure 3: Fairness hacking in the wild for the Medical Expenditure Panel Survey data.
Figure 4: Intra-metric fairness hacking for equal opportunity and statistical parity. Plot conventions as in Figure \ref{['fig::Figure11']}.
Figure 5: Equal opportunity and statistical parity for all 134 binary attributes. Plot conventions as in Figure \ref{['fig::Figure11']}. The number of items belonging to the protected or unprotected group differs between actual attributes in the MEPS dataset (race, sex, etc.). Thus, we cannot calculate a single null hypothesis confidence interval.
...and 3 more figures

Fairness Hacking: The Malicious Practice of Shrouding Unfairness in Algorithms

TL;DR

Abstract

Fairness Hacking: The Malicious Practice of Shrouding Unfairness in Algorithms

Authors

TL;DR

Abstract

Table of Contents

Figures (8)