Explainable post-training bias mitigation with distribution-based fairness metrics
Ryan Franks, Alexey Miroshnikov, Konstandinos Kotsiopoulos
TL;DR
The paper tackles fairness in regulated ML settings by proposing a post-training framework that enforces distribution-based fairness constraints without retraining the underlying model. It develops a differentiable family of post-processed models ${\cal F}(f_*;w)$ and deploys stochastic gradient descent with new global bias metrics to construct fairness-efficient frontiers. Three explainable encoder families—additive-model corrections, tree rebalancing, and explanation rebalancing—enable scalable, demographically blind bias mitigation while preserving interpretability. Empirical results on synthetic and real-world data show strong bias-performance frontiers and illustrate how dataset properties shape the efficacy of each method. This approach provides a flexible, scalable pathway to fairer, explainable models in finance-like applications where regulatory and transparency requirements are paramount.
Abstract
We develop a novel bias mitigation framework with distribution-based fairness constraints suitable for producing demographically blind and explainable machine-learning models across a wide range of fairness levels. This is accomplished through post-processing, allowing fairer models to be generated efficiently without retraining the underlying model. Our framework, which is based on stochastic gradient descent, can be applied to a wide range of model types, with a particular emphasis on the post-processing of gradient-boosted decision trees. Additionally, we design a broad family of global fairness metrics, along with differentiable and consistent estimators compatible with our framework, building on previous work. We empirically test our methodology on a variety of datasets and compare it with alternative post-processing approaches, including Bayesian search, optimal transport projection, and direct neural network training.
