Table of Contents
Fetching ...

FairUDT: Fairness-aware Uplift Decision Trees

Anam Zahid, Abdur Rehman Ali, Shaina Raza, Rai Shahnawaz, Faisal Kamiran, Asim Karim

TL;DR

FairUDT addresses discrimination in ML by integrating uplift modeling with decision trees to identify discriminatory subgroups and apply a targeted leaf relabeling pre-processing step. It builds a tree on dual groups defined by a binary sensitive attribute and uses divergence-based splitting criteria (KL_gain or E_gain) to maximize distribution differences between groups, followed by a tunable leaf relabeling threshold sigma_t to neutralize discrimination at discriminatory leaves. On Adult, COMPAS, and German Credit datasets, FairUDT achieves favorable accuracy-discrimination tradeoffs and yields interpretable, sparser trees, highlighting its practical utility and transparency. The work introduces a novel coupling of uplift-based discrimination detection with a dataset pre-processing technique, and provides open-source code for reproducibility.

Abstract

Training data used for developing machine learning classifiers can exhibit biases against specific protected attributes. Such biases typically originate from historical discrimination or certain underlying patterns that disproportionately under-represent minority groups, such as those identified by their gender, religion, or race. In this paper, we propose a novel approach, FairUDT, a fairness-aware Uplift-based Decision Tree for discrimination identification. FairUDT demonstrates how the integration of uplift modeling with decision trees can be adapted to include fair splitting criteria. Additionally, we introduce a modified leaf relabeling approach for removing discrimination. We divide our dataset into favored and deprived groups based on a binary sensitive attribute, with the favored dataset serving as the treatment group and the deprived dataset as the control group. By applying FairUDT and our leaf relabeling approach to preprocess three benchmark datasets, we achieve an acceptable accuracy-discrimination tradeoff. We also show that FairUDT is inherently interpretable and can be utilized in discrimination detection tasks. The code for this project is available https://github.com/ara-25/FairUDT

FairUDT: Fairness-aware Uplift Decision Trees

TL;DR

FairUDT addresses discrimination in ML by integrating uplift modeling with decision trees to identify discriminatory subgroups and apply a targeted leaf relabeling pre-processing step. It builds a tree on dual groups defined by a binary sensitive attribute and uses divergence-based splitting criteria (KL_gain or E_gain) to maximize distribution differences between groups, followed by a tunable leaf relabeling threshold sigma_t to neutralize discrimination at discriminatory leaves. On Adult, COMPAS, and German Credit datasets, FairUDT achieves favorable accuracy-discrimination tradeoffs and yields interpretable, sparser trees, highlighting its practical utility and transparency. The work introduces a novel coupling of uplift-based discrimination detection with a dataset pre-processing technique, and provides open-source code for reproducibility.

Abstract

Training data used for developing machine learning classifiers can exhibit biases against specific protected attributes. Such biases typically originate from historical discrimination or certain underlying patterns that disproportionately under-represent minority groups, such as those identified by their gender, religion, or race. In this paper, we propose a novel approach, FairUDT, a fairness-aware Uplift-based Decision Tree for discrimination identification. FairUDT demonstrates how the integration of uplift modeling with decision trees can be adapted to include fair splitting criteria. Additionally, we introduce a modified leaf relabeling approach for removing discrimination. We divide our dataset into favored and deprived groups based on a binary sensitive attribute, with the favored dataset serving as the treatment group and the deprived dataset as the control group. By applying FairUDT and our leaf relabeling approach to preprocess three benchmark datasets, we achieve an acceptable accuracy-discrimination tradeoff. We also show that FairUDT is inherently interpretable and can be utilized in discrimination detection tasks. The code for this project is available https://github.com/ara-25/FairUDT

Paper Structure

This paper contains 41 sections, 20 equations, 3 figures, 8 tables, 1 algorithm.

Figures (3)

  • Figure 1: End-to-end pipeline of FairUDT and leaf relabelling
  • Figure 2: ROC for all three datasets
  • Figure 3: LR classification results for the three datasets after pre-processing data at different thresholds of $\sigma_t$. Error bands show the standard deviation of different results from 10-fold cross-validation. The black dashed line shows the corresponding metric after evaluating LR on the original dataset. The test set is sampled from raw data.