Table of Contents
Fetching ...

FairML: A Julia Package for Fair Classification

Jan Pablo Burgard, João Vitor Pamplona

TL;DR

FairML.jl tackles fair classification by decomposing the learning pipeline into preprocessing, in-processing, and post-processing stages to reduce disparate impact and disparate mistreatment. It introduces a resampling-based preprocessing scheme to mitigate data imbalances, formulates fairness-constrained LR and SVM (including mixed-effects extensions) for in-processing, and uses a cross-validated cut-off optimization in post-processing to balance accuracy and fairness. The approach supports both fixed and random effects, enabling fair learning in heterogeneous populations, and demonstrates, via extensive simulations, that combining phases yields superior reductions in DI and DM with manageable accuracy trade-offs. This Julia-based framework offers a flexible toolkit for practitioners to impose fairness constraints while leveraging existing MLJ.jl models, with potential impact on automated decision systems in finance, justice, and beyond.

Abstract

In this paper, we propose FairML.jl, a Julia package providing a framework for fair classification in machine learning. In this framework, the fair learning process is divided into three stages. Each stage aims to reduce unfairness, such as disparate impact and disparate mistreatment, in the final prediction. For the preprocessing stage, we present a resampling method that addresses unfairness coming from data imbalances. The in-processing phase consist of a classification method. This can be either one coming from the MLJ.jl package, or a user defined one. For this phase, we incorporate fair ML methods that can handle unfairness to a certain degree through their optimization process. In the post-processing, we discuss the choice of the cut-off value for fair prediction. With simulations, we show the performance of the single phases and their combinations.

FairML: A Julia Package for Fair Classification

TL;DR

FairML.jl tackles fair classification by decomposing the learning pipeline into preprocessing, in-processing, and post-processing stages to reduce disparate impact and disparate mistreatment. It introduces a resampling-based preprocessing scheme to mitigate data imbalances, formulates fairness-constrained LR and SVM (including mixed-effects extensions) for in-processing, and uses a cross-validated cut-off optimization in post-processing to balance accuracy and fairness. The approach supports both fixed and random effects, enabling fair learning in heterogeneous populations, and demonstrates, via extensive simulations, that combining phases yields superior reductions in DI and DM with manageable accuracy trade-offs. This Julia-based framework offers a flexible toolkit for practitioners to impose fairness constraints while leveraging existing MLJ.jl models, with potential impact on automated decision systems in finance, justice, and beyond.

Abstract

In this paper, we propose FairML.jl, a Julia package providing a framework for fair classification in machine learning. In this framework, the fair learning process is divided into three stages. Each stage aims to reduce unfairness, such as disparate impact and disparate mistreatment, in the final prediction. For the preprocessing stage, we present a resampling method that addresses unfairness coming from data imbalances. The in-processing phase consist of a classification method. This can be either one coming from the MLJ.jl package, or a user defined one. For this phase, we incorporate fair ML methods that can handle unfairness to a certain degree through their optimization process. In the post-processing, we discuss the choice of the cut-off value for fair prediction. With simulations, we show the performance of the single phases and their combinations.

Paper Structure

This paper contains 9 sections, 30 equations, 16 figures.

Figures (16)

  • Figure 1: Preprocessing results: First row: Comparison for logistic regression. Second row: Comparison for support vector machine. Left: Without preprocessing. Right: With preprocessing (R=1)
  • Figure 2: Preprocessing with multiple runs (R=5): Left: Logistic regression. Right: Support vector machine.
  • Figure 3: In-processing results: First row: Comparison for logistic regression. Second row: Comparison for support vector machine. Left: Disparate impact. Right: Disparate mistreatment.
  • Figure 4: Post-processing results for disparate impact: First row: Comparison for logistic regression. Second row: Comparison for support vector machine. Left: Only post-processing. Right: In-processing and post-processing.
  • Figure 5: Post-processing results for disparate mistreatment: First row: Comparison for logistic regression. Second row: Comparison for support vector machine. Left: Only post-processing. Right: In-processing and post-processing.
  • ...and 11 more figures