Table of Contents
Fetching ...

Software Engineering Principles for Fairer Systems: Experiments with GroupCART

Kewen Peng, Hao Zhuo, Yicheng Yang, Tim Menzies

TL;DR

This work tackles algorithmic fairness in software-engineering contexts by introducing GroupCART, a fairness-aware tree-based ensemble that optimizes both target-prediction entropy and protected-attribute entropy during splits. By framing fairness and accuracy as a multi-objective optimization and using continuous domination to form a Pareto frontier, GroupCART generates multiple non-dominated configurations and aggregates them via ensemble voting, enabling customizable trade-offs between performance and fairness. The approach demonstrates on-par or superior predictive performance with significantly improved fairness metrics across several datasets, and it supports multiple protected attributes with smooth, tunable behavior. Practically, GroupCART offers a configurable, data-backed method to mitigate bias without data transformation, with publicly available code and a framework extensible to other tree-based methods.

Abstract

Discrimination-aware classification aims to make accurate predictions while satisfying fairness constraints. Traditional decision tree learners typically optimize for information gain in the target attribute alone, which can result in models that unfairly discriminate against protected social groups (e.g., gender, ethnicity). Motivated by these shortcomings, we propose GroupCART, a tree-based ensemble optimizer that avoids bias during model construction by optimizing not only for decreased entropy in the target attribute but also for increased entropy in protected attributes. Our experiments show that GroupCART achieves fairer models without data transformation and with minimal performance degradation. Furthermore, the method supports customizable weighting, offering a smooth and flexible trade-off between predictive performance and fairness based on user requirements. These results demonstrate that algorithmic bias in decision tree models can be mitigated through multi-task, fairness-aware learning. All code and datasets used in this study are available at: https://github.com/anonymous12138/groupCART.

Software Engineering Principles for Fairer Systems: Experiments with GroupCART

TL;DR

This work tackles algorithmic fairness in software-engineering contexts by introducing GroupCART, a fairness-aware tree-based ensemble that optimizes both target-prediction entropy and protected-attribute entropy during splits. By framing fairness and accuracy as a multi-objective optimization and using continuous domination to form a Pareto frontier, GroupCART generates multiple non-dominated configurations and aggregates them via ensemble voting, enabling customizable trade-offs between performance and fairness. The approach demonstrates on-par or superior predictive performance with significantly improved fairness metrics across several datasets, and it supports multiple protected attributes with smooth, tunable behavior. Practically, GroupCART offers a configurable, data-backed method to mitigate bias without data transformation, with publicly available code and a framework extensible to other tree-based methods.

Abstract

Discrimination-aware classification aims to make accurate predictions while satisfying fairness constraints. Traditional decision tree learners typically optimize for information gain in the target attribute alone, which can result in models that unfairly discriminate against protected social groups (e.g., gender, ethnicity). Motivated by these shortcomings, we propose GroupCART, a tree-based ensemble optimizer that avoids bias during model construction by optimizing not only for decreased entropy in the target attribute but also for increased entropy in protected attributes. Our experiments show that GroupCART achieves fairer models without data transformation and with minimal performance degradation. Furthermore, the method supports customizable weighting, offering a smooth and flexible trade-off between predictive performance and fairness based on user requirements. These results demonstrate that algorithmic bias in decision tree models can be mitigated through multi-task, fairness-aware learning. All code and datasets used in this study are available at: https://github.com/anonymous12138/groupCART.

Paper Structure

This paper contains 25 sections, 5 equations, 5 figures, 7 tables, 3 algorithms.

Figures (5)

  • Figure 1: Pareto frontier in the Adult dataset with protected attribute race.
  • Figure 2: Presented in the paper be Cruz et al. cruz2021promoting, the Pareto frontier of the fairness-performance trade-off in three datasets. Each dot represents a hyper-parameter configuration setting applied on one of the five ML algorithms selected by Cruz et al..
  • Figure 3: The general flowchart of the GroupCART Algo.
  • Figure 4: Results for RQ2. Better results have higher fairness and performance (i.e., found top-right).
  • Figure 5: Results for RQ4. The two figures on the left-hand side present disparate impact (DI) scores on two different protected attributes in the Adult dataset, and darker colors indicate better fairness.