Table of Contents
Fetching ...

General Post-Processing Framework for Fairness Adjustment of Machine Learning Models

Léandre Eberhard, Nirek Sharma, Filipp Shelobolin, Aalok Ganesh Shanbhag

TL;DR

This work addresses the tension between predictive performance and fairness constraints by introducing a post-processing fairness adjuster that offsets baseline predictions with a learned adjustment $g(X)$. The method separates fairness optimization from the training objective, enabling use with black-box models and data-driven fairness tuning without altering the original model or labels used for training. Theoretical analysis provides bounds comparing the adjuster to joint in-processing, including equivalence in linear regression and cross-entropy-like bounds for classification, while experiments on Adult, German, and COMPAS datasets demonstrate a near-identical fairness-accuracy tradeoff to adversarial debiasing with practical benefits in flexibility and interpretability. Overall, the framework offers a robust, adaptable, and auditable pathway to post-hoc fairness adjustments suitable for diverse models and datasets.

Abstract

As machine learning increasingly influences critical domains such as credit underwriting, public policy, and talent acquisition, ensuring compliance with fairness constraints is both a legal and ethical imperative. This paper introduces a novel framework for fairness adjustments that applies to diverse machine learning tasks, including regression and classification, and accommodates a wide range of fairness metrics. Unlike traditional approaches categorized as pre-processing, in-processing, or post-processing, our method adapts in-processing techniques for use as a post-processing step. By decoupling fairness adjustments from the model training process, our framework preserves model performance on average while enabling greater flexibility in model development. Key advantages include eliminating the need for custom loss functions, enabling fairness tuning using different datasets, accommodating proprietary models as black-box systems, and providing interpretable insights into the fairness adjustments. We demonstrate the effectiveness of this approach by comparing it to Adversarial Debiasing, showing that our framework achieves a comparable fairness/accuracy tradeoff on real-world datasets.

General Post-Processing Framework for Fairness Adjustment of Machine Learning Models

TL;DR

This work addresses the tension between predictive performance and fairness constraints by introducing a post-processing fairness adjuster that offsets baseline predictions with a learned adjustment . The method separates fairness optimization from the training objective, enabling use with black-box models and data-driven fairness tuning without altering the original model or labels used for training. Theoretical analysis provides bounds comparing the adjuster to joint in-processing, including equivalence in linear regression and cross-entropy-like bounds for classification, while experiments on Adult, German, and COMPAS datasets demonstrate a near-identical fairness-accuracy tradeoff to adversarial debiasing with practical benefits in flexibility and interpretability. Overall, the framework offers a robust, adaptable, and auditable pathway to post-hoc fairness adjustments suitable for diverse models and datasets.

Abstract

As machine learning increasingly influences critical domains such as credit underwriting, public policy, and talent acquisition, ensuring compliance with fairness constraints is both a legal and ethical imperative. This paper introduces a novel framework for fairness adjustments that applies to diverse machine learning tasks, including regression and classification, and accommodates a wide range of fairness metrics. Unlike traditional approaches categorized as pre-processing, in-processing, or post-processing, our method adapts in-processing techniques for use as a post-processing step. By decoupling fairness adjustments from the model training process, our framework preserves model performance on average while enabling greater flexibility in model development. Key advantages include eliminating the need for custom loss functions, enabling fairness tuning using different datasets, accommodating proprietary models as black-box systems, and providing interpretable insights into the fairness adjustments. We demonstrate the effectiveness of this approach by comparing it to Adversarial Debiasing, showing that our framework achieves a comparable fairness/accuracy tradeoff on real-world datasets.

Paper Structure

This paper contains 15 sections, 5 theorems, 36 equations, 1 figure.

Key Result

Proposition 4.1

Let $L(\hat{y}, y) = \sum_{i=1}^n (\hat{y}_i - y_i)^2$ denote the Mean Squared Error (MSE) loss. For any perturbation vector $g \in \mathbb{R}^n$, the change in the MSE loss when $g$ is added to the prediction $\hat{y}$ given by can be expressed as Alternatively, it can be decomposed into: where $L(\hat{y} + g, \hat{y}) = \sum_{i=1}^n g_i^2$ represents the MSE between $\hat{y}$ and $\hat{y} + g

Figures (1)

  • Figure 1: 50-seed, 5-fold CV results for three datasets: German, Adult, and COMPAS. The figures show the fairness-accuracy tradeoff for Adversarial Debiasing and the fairness adjuster resulting from the random CV splitting. Each point represents the average of the metrics over the 5 folds for either method. We fit a regression line with confidence bounds to better visualize the results.

Theorems & Definitions (9)

  • Proposition 4.1: Performance loss of adjustment from optimal model
  • proof
  • Proposition 4.2
  • proof
  • Proposition 4.3: Equivalence in Linear Regression Case
  • proof
  • Proposition 4.4: Accuracy-Fairness Trade-Off
  • Proposition 4.5: Bounding Cross-Entropy for Adjuster vs. Adversarial Debiasing
  • proof