Table of Contents
Fetching ...

Fair MP-BOOST: Fair and Interpretable Minipatch Boosting

Camille Olivia Little, Genevera I. Allen

TL;DR

Fair MP-Boost addresses the need for boosting methods that are simultaneously accurate, fair, and interpretable for tabular data. It introduces a double stochastic gradient boosting framework that adaptively learns minipatch distributions over observations and features, controlled by the tradeoff parameter $\alpha$ and guided by $\mathcal{L}_A$ and $\mathcal{L}_F$ to balance accuracy and fairness. The method yields intrinsic interpretability through averaged sampling probabilities and feature importance metrics such as TreeFIS and FairTreeFIS, and uses out-of-patch validation for early stopping and tuning. Empirical results on simulated data and real benchmarks (Adult and Law School) show that Fair MP-Boost can outperform competing bias-mitigation approaches in fairness while maintaining competitive accuracy, offering a practical, interpretable alternative for fair boosting in high-stakes domains.

Abstract

Ensemble methods, particularly boosting, have established themselves as highly effective and widely embraced machine learning techniques for tabular data. In this paper, we aim to leverage the robust predictive power of traditional boosting methods while enhancing fairness and interpretability. To achieve this, we develop Fair MP-Boost, a stochastic boosting scheme that balances fairness and accuracy by adaptively learning features and observations during training. Specifically, Fair MP-Boost sequentially samples small subsets of observations and features, termed minipatches (MP), according to adaptively learned feature and observation sampling probabilities. We devise these probabilities by combining loss functions, or by combining feature importance scores to address accuracy and fairness simultaneously. Hence, Fair MP-Boost prioritizes important and fair features along with challenging instances, to select the most relevant minipatches for learning. The learned probability distributions also yield intrinsic interpretations of feature importance and important observations in Fair MP-Boost. Through empirical evaluation of simulated and benchmark datasets, we showcase the interpretability, accuracy, and fairness of Fair MP-Boost.

Fair MP-BOOST: Fair and Interpretable Minipatch Boosting

TL;DR

Fair MP-Boost addresses the need for boosting methods that are simultaneously accurate, fair, and interpretable for tabular data. It introduces a double stochastic gradient boosting framework that adaptively learns minipatch distributions over observations and features, controlled by the tradeoff parameter and guided by and to balance accuracy and fairness. The method yields intrinsic interpretability through averaged sampling probabilities and feature importance metrics such as TreeFIS and FairTreeFIS, and uses out-of-patch validation for early stopping and tuning. Empirical results on simulated data and real benchmarks (Adult and Law School) show that Fair MP-Boost can outperform competing bias-mitigation approaches in fairness while maintaining competitive accuracy, offering a practical, interpretable alternative for fair boosting in high-stakes domains.

Abstract

Ensemble methods, particularly boosting, have established themselves as highly effective and widely embraced machine learning techniques for tabular data. In this paper, we aim to leverage the robust predictive power of traditional boosting methods while enhancing fairness and interpretability. To achieve this, we develop Fair MP-Boost, a stochastic boosting scheme that balances fairness and accuracy by adaptively learning features and observations during training. Specifically, Fair MP-Boost sequentially samples small subsets of observations and features, termed minipatches (MP), according to adaptively learned feature and observation sampling probabilities. We devise these probabilities by combining loss functions, or by combining feature importance scores to address accuracy and fairness simultaneously. Hence, Fair MP-Boost prioritizes important and fair features along with challenging instances, to select the most relevant minipatches for learning. The learned probability distributions also yield intrinsic interpretations of feature importance and important observations in Fair MP-Boost. Through empirical evaluation of simulated and benchmark datasets, we showcase the interpretability, accuracy, and fairness of Fair MP-Boost.
Paper Structure (9 sections, 4 equations, 2 figures, 2 tables, 1 algorithm)

This paper contains 9 sections, 4 equations, 2 figures, 2 tables, 1 algorithm.

Figures (2)

  • Figure 1: Average feature sampling probabilities after a 20 iteration burn-in phase of Fair MP-Boost on the simulated dataset with $\alpha = 0.1$ (top; accuracy prioritized) and $\alpha = 0.9$ (bottom; fairness prioritized). Fair MP-Boost correctly learns features associated with the signal (blue and purple) when $\alpha = 0.1$ and features associated with the signal but independent from the protected attribute (blue) when $\alpha = 0.9$; these results validate the feature interpretability of Fair MP-Boost.
  • Figure 2: Interpretation of the Adult dataset with Gender as the protected attribute using our Fair MP-Boost algorithm. Part A shows Fair MP-Boost average feature sampling probabilities when accuracy is prioritized (left) and when fairness is prioritized (right). Part B shows feature importance metrics using MDI (TreeFIS) and the fairness-based FairTreeFIS for these same Fair MP-Boost models. When accuracy is prioritized, Fair MP-Boost heavily utilizes the feature Married, but this feature also strongly contributes to bias in the model (negative FairTreeFIS score). When fairness is prioritized, however, Fair MP-Boost utilizes less of the feature Married and more of the features Capital Gain and Edu Num which leads to improved accuracy and fairness. Overall, the interpretability of our feature sampling probabilities from Fair MP-Boost aligns with interpretations in Part B, hence validating our results.

Theorems & Definitions (1)

  • Definition 1