FairGridSearch: A Framework to Compare Fairness-Enhancing Models

Shih-Chi Ma; Tatiana Ermakova; Benjamin Fabian

FairGridSearch: A Framework to Compare Fairness-Enhancing Models

Shih-Chi Ma, Tatiana Ermakova, Benjamin Fabian

TL;DR

FairGridSearch addresses the challenge of selecting fairness-enhancing models for binary classification by providing a grid-search-like framework to compare multiple bias-mitigation methods, base estimators, thresholds, and evaluation metrics. It incorporates cross-validation and a cost-based best-model criterion, defined as $C = \alpha \cdot (1 - metric_{acc}) + \beta \cdot |metric_{fair}|$, with $\alpha$ and $\beta$ set to 1 in experiments, balancing accuracy and fairness. Experiments on the Adult, COMPAS, and German Credit datasets show that metric choice, base estimator, and threshold significantly influence fairness outcomes, with no universal best approach across datasets. The work highlights the need to consider a broad set of factors beyond bias mitigation alone and provides a practical tool for systematic fair-model selection.

Abstract

Machine learning models are increasingly used in critical decision-making applications. However, these models are susceptible to replicating or even amplifying bias present in real-world data. While there are various bias mitigation methods and base estimators in the literature, selecting the optimal model for a specific application remains challenging. This paper focuses on binary classification and proposes FairGridSearch, a novel framework for comparing fairness-enhancing models. FairGridSearch enables experimentation with different model parameter combinations and recommends the best one. The study applies FairGridSearch to three popular datasets (Adult, COMPAS, and German Credit) and analyzes the impacts of metric selection, base estimator choice, and classification threshold on model fairness. The results highlight the significance of selecting appropriate accuracy and fairness metrics for model evaluation. Additionally, different base estimators and classification threshold values affect the effectiveness of bias mitigation methods and fairness stability respectively, but the effects are not consistent across all datasets. Based on these findings, future research on fairness in machine learning should consider a broader range of factors when building fair models, going beyond bias mitigation methods alone.

FairGridSearch: A Framework to Compare Fairness-Enhancing Models

TL;DR

, with

and

set to 1 in experiments, balancing accuracy and fairness. Experiments on the Adult, COMPAS, and German Credit datasets show that metric choice, base estimator, and threshold significantly influence fairness outcomes, with no universal best approach across datasets. The work highlights the need to consider a broad set of factors beyond bias mitigation alone and provides a practical tool for systematic fair-model selection.

Abstract

Paper Structure (25 sections, 1 equation, 7 figures, 5 tables, 1 algorithm)

This paper contains 25 sections, 1 equation, 7 figures, 5 tables, 1 algorithm.

Introduction
Related Work and Foundations
Related Work
Algorithmic Fairness
Bias Mitigation
FairGridSearch Framework
General Framework
Parameter Tuning
Base Estimator
Classification Threshold
Bias Mitigation
Best Model Criterion
Accuracy Metrics
Fairness Metrics
Exemplary Experiments
...and 10 more sections

Figures (7)

Figure 1: (NORM_MCC, BACC) is the only accuracy metric pair showing high positive correlations across all datasets.
Figure 2: Correlation between fairness metrics varies substantially across datasets.
Figure 3: Accuracy metrics respond differently to bias mitigators.
Figure 4: Efficacy of bm methods varies across different fairness metrics.
Figure 5: No single base estimator consistently outperforms the others.
...and 2 more figures

FairGridSearch: A Framework to Compare Fairness-Enhancing Models

TL;DR

Abstract

FairGridSearch: A Framework to Compare Fairness-Enhancing Models

Authors

TL;DR

Abstract

Table of Contents

Figures (7)