Software Engineering Principles for Fairer Systems: Experiments with GroupCART
Kewen Peng, Hao Zhuo, Yicheng Yang, Tim Menzies
TL;DR
This work tackles algorithmic fairness in software-engineering contexts by introducing GroupCART, a fairness-aware tree-based ensemble that optimizes both target-prediction entropy and protected-attribute entropy during splits. By framing fairness and accuracy as a multi-objective optimization and using continuous domination to form a Pareto frontier, GroupCART generates multiple non-dominated configurations and aggregates them via ensemble voting, enabling customizable trade-offs between performance and fairness. The approach demonstrates on-par or superior predictive performance with significantly improved fairness metrics across several datasets, and it supports multiple protected attributes with smooth, tunable behavior. Practically, GroupCART offers a configurable, data-backed method to mitigate bias without data transformation, with publicly available code and a framework extensible to other tree-based methods.
Abstract
Discrimination-aware classification aims to make accurate predictions while satisfying fairness constraints. Traditional decision tree learners typically optimize for information gain in the target attribute alone, which can result in models that unfairly discriminate against protected social groups (e.g., gender, ethnicity). Motivated by these shortcomings, we propose GroupCART, a tree-based ensemble optimizer that avoids bias during model construction by optimizing not only for decreased entropy in the target attribute but also for increased entropy in protected attributes. Our experiments show that GroupCART achieves fairer models without data transformation and with minimal performance degradation. Furthermore, the method supports customizable weighting, offering a smooth and flexible trade-off between predictive performance and fairness based on user requirements. These results demonstrate that algorithmic bias in decision tree models can be mitigated through multi-task, fairness-aware learning. All code and datasets used in this study are available at: https://github.com/anonymous12138/groupCART.
