Table of Contents
Fetching ...

Explainable post-training bias mitigation with distribution-based fairness metrics

Ryan Franks, Alexey Miroshnikov, Konstandinos Kotsiopoulos

TL;DR

The paper tackles fairness in regulated ML settings by proposing a post-training framework that enforces distribution-based fairness constraints without retraining the underlying model. It develops a differentiable family of post-processed models ${\cal F}(f_*;w)$ and deploys stochastic gradient descent with new global bias metrics to construct fairness-efficient frontiers. Three explainable encoder families—additive-model corrections, tree rebalancing, and explanation rebalancing—enable scalable, demographically blind bias mitigation while preserving interpretability. Empirical results on synthetic and real-world data show strong bias-performance frontiers and illustrate how dataset properties shape the efficacy of each method. This approach provides a flexible, scalable pathway to fairer, explainable models in finance-like applications where regulatory and transparency requirements are paramount.

Abstract

We develop a novel bias mitigation framework with distribution-based fairness constraints suitable for producing demographically blind and explainable machine-learning models across a wide range of fairness levels. This is accomplished through post-processing, allowing fairer models to be generated efficiently without retraining the underlying model. Our framework, which is based on stochastic gradient descent, can be applied to a wide range of model types, with a particular emphasis on the post-processing of gradient-boosted decision trees. Additionally, we design a broad family of global fairness metrics, along with differentiable and consistent estimators compatible with our framework, building on previous work. We empirically test our methodology on a variety of datasets and compare it with alternative post-processing approaches, including Bayesian search, optimal transport projection, and direct neural network training.

Explainable post-training bias mitigation with distribution-based fairness metrics

TL;DR

The paper tackles fairness in regulated ML settings by proposing a post-training framework that enforces distribution-based fairness constraints without retraining the underlying model. It develops a differentiable family of post-processed models and deploys stochastic gradient descent with new global bias metrics to construct fairness-efficient frontiers. Three explainable encoder families—additive-model corrections, tree rebalancing, and explanation rebalancing—enable scalable, demographically blind bias mitigation while preserving interpretability. Empirical results on synthetic and real-world data show strong bias-performance frontiers and illustrate how dataset properties shape the efficacy of each method. This approach provides a flexible, scalable pathway to fairer, explainable models in finance-like applications where regulatory and transparency requirements are paramount.

Abstract

We develop a novel bias mitigation framework with distribution-based fairness constraints suitable for producing demographically blind and explainable machine-learning models across a wide range of fairness levels. This is accomplished through post-processing, allowing fairer models to be generated efficiently without retraining the underlying model. Our framework, which is based on stochastic gradient descent, can be applied to a wide range of model types, with a particular emphasis on the post-processing of gradient-boosted decision trees. Additionally, we design a broad family of global fairness metrics, along with differentiable and consistent estimators compatible with our framework, building on previous work. We empirically test our methodology on a variety of datasets and compare it with alternative post-processing approaches, including Bayesian search, optimal transport projection, and direct neural network training.

Paper Structure

This paper contains 42 sections, 14 theorems, 122 equations, 4 figures, 3 tables, 2 algorithms.

Key Result

Lemma 2.1

Let a model $f$ and the random variables $(X,G)$ be as in Definition def::modbias. Let $f_t(x) = \mathbbm{1}_{\{f(x) > t\}}$ denote a derived classifier. The $W_1$-model bias can be expressed as follows:

Figures (4)

  • Figure 1: Efficient frontiers for data model 1 and data model 2. All results are presented on their respective test datasets.
  • Figure 2: Efficient frontiers for UCI Adult, UCI Bank Marketing, and COMPAS datasets, evaluated on their test datasets.
  • Figure 3: All models evaluated during Bayesian search and random search for the UCI Adult, UCI Bank Marketing, and COMPAS datasets. The first row displays the results for the predictor rescaling experiments presented earlier using selected predictors (for COMPAS, all predictors were selected). The second row displays results for analogous predictor rescaling experiments for UCI Adult and UCI Bank Marketing using all predictors. All results are presented on their respective test datasets.
  • Figure 4: Efficient frontiers for UCI Adult, Bank Marketing, and COMPAS datasets, evaluated on their test datasets.

Theorems & Definitions (64)

  • Definition 2.1
  • Definition 2.2
  • Definition 2.3
  • Remark 2.1
  • Definition 2.4: Wasserstein-1 model bias Miroshnikov2020
  • Lemma 2.1
  • proof
  • Proposition 2.1
  • proof
  • Definition 2.5
  • ...and 54 more