Fairness Interventions: A Study in AI Explainability
Thomas Souverain, Johnathan Nguyen, Nicolas Meric, Paul Égré
TL;DR
The paper investigates how fairness interventions in AI can be made explainable by examining not only the target fairness criterion but also the variables that constrain its realization. It introduces FairDream, an in-processing, reweighting-based fairness tool that makes its mechanism transparent to lay users and tends to Equalized Odds rather than Demographic Parity in a census income task. Through benchmark comparisons with GridSearch, the work shows that enforcing Demographic Parity can degrade predictive accuracy, whereas FairDream achieves competitive accuracy while reducing disparities under a conditional, true-label–dependent fairness criterion. The authors argue for Equalized Odds as a normative, epistemologically grounded standard in settings where true labels are informative, and advocate for clear explainability about how such corrections operate and respond to data changes.
Abstract
This paper presents a philosophical and experimental study of fairness interventions in AI classification, centered on the explainability of corrective methods. We argue that ensuring fairness requires not only satisfying a target criterion, but also explaining which variables constrain its realization. When corrections are used to mitigate advantage transparently, they must remain sensitive to the distribution of true labels. To illustrate this approach, we built FairDream, a fairness package whose mechanism is made transparent for lay users, increasing the model's weights of errors on disadvantaged groups. While a user may intend to achieve Demographic Parity by the correction method, experiments show that FairDream tends towards Equalized Odds, revealing a conservative bias inherent to the data environment. We clarify the relationship between these fairness criteria, analyze FairDream's reweighting process, and compare its trade-offs with closely related GridSearch models. Finally, we justify the normative preference for Equalized Odds via an epistemological interpretation of the results, using their proximity with Simpson's paradox. The paper thus unites normative, epistemological, and empirical explanations of fairness interventions, to ensure transparency for the users.
