Fairness in Machine Learning: A Survey
Simon Caton, Christian Haas
TL;DR
This survey addresses the problem of bias and fairness in ML, focusing on how to characterize, measure, and mitigate unfair outcomes across domains. It presents a two-dimensional taxonomy that partitions mitigation techniques into pre-processing, in-processing, and post-processing, covering 11 method families, and extends discussion beyond binary classification to regression, recommender systems, unsupervised learning, and NLP. The work compiles a comprehensive set of fairness metrics (group and individual, including parity, calibration, and counterfactual notions) and surveys a wide spectrum of techniques from data repair to adversarial learning and thresholding, highlighting practical challenges and platform support. It concludes with four broad dilemmas—performance trade-offs, competing fairness notions, contextual and policy tensions, and the fairness skills gap—advocating for more realistic data, causal tooling, and accessible, governance-aligned tooling to advance fair ML in practice.
Abstract
As Machine Learning technologies become increasingly used in contexts that affect citizens, companies as well as researchers need to be confident that their application of these methods will not have unexpected social implications, such as bias towards gender, ethnicity, and/or people with disabilities. There is significant literature on approaches to mitigate bias and promote fairness, yet the area is complex and hard to penetrate for newcomers to the domain. This article seeks to provide an overview of the different schools of thought and approaches to mitigating (social) biases and increase fairness in the Machine Learning literature. It organises approaches into the widely accepted framework of pre-processing, in-processing, and post-processing methods, subcategorizing into a further 11 method areas. Although much of the literature emphasizes binary classification, a discussion of fairness in regression, recommender systems, unsupervised learning, and natural language processing is also provided along with a selection of currently available open source libraries. The article concludes by summarising open challenges articulated as four dilemmas for fairness research.
