Discrimination in machine learning algorithms
Roberta Pappadà, Francesco Pauli
TL;DR
This work tackles discrimination in data-driven decisions by employing causal-inference tools, with an emphasis on data preprocessing to detect and mitigate bias. It introduces a Coarsened Exact Matching (CEM) based discrimination score, $D_i$, and compares it to a kNN-based measure, $\delta_i$, using sequential CEM across random variable orders to stabilize estimates. The method is validated on real-world datasets (Adult, COMPAS, Custody) and through simulations that manipulate discrimination presence and conditioning variables, showing that the CEM-based measure can robustly detect discrimination and, in some settings, outperform the alternative. Overall, the approach provides a practical, auditing-oriented framework for assessing and reducing unfair treatment in high-stakes decisions.
Abstract
Machine learning algorithms are routinely used for business decisions that may directly affect individuals, for example, because a credit scoring algorithm refuses them a loan. It is then relevant from an ethical (and legal) point of view to ensure that these algorithms do not discriminate based on sensitive attributes (like sex or race), which may occur unwittingly and unknowingly by the operator and the management. Statistical tools and methods are then required to detect and eliminate such potential biases.
