BiasGuard: Guardrailing Fairness in Machine Learning Production Systems

Nurit Cohen-Inger; Seffi Cohen; Neomi Rabaev; Lior Rokach; Bracha Shapira

BiasGuard: Guardrailing Fairness in Machine Learning Production Systems

Nurit Cohen-Inger, Seffi Cohen, Neomi Rabaev, Lior Rokach, Bracha Shapira

TL;DR

BiasGuard tackles fairness in production ML by introducing a post-processing guardrail that uses Test-Time Augmentation with CTGAN to generate synthetic samples conditioned on inverse protected attributes. The method detects potential bias at inference by comparing original and opposite-protected predictions and, when needed, augments with $\mathcal{T}$ synthetic samples to balance outcomes, aggregating predictions to reduce disparities. Across five datasets, BiasGuard delivers a substantial $EOD$ improvement (≈31%) with only a minor average accuracy loss (≈0.09%), outperforming Threshold Optimizer and Reject Option in fairness with less detrimental accuracy trade-offs. This approach provides a model-agnostic, deployment-friendly mechanism to safeguard fairness in dynamic production settings without retraining, though it introduces inference-time overhead that is mitigated by configurable augmentation levels and hardware acceleration.

Abstract

As machine learning (ML) systems increasingly impact critical sectors such as hiring, financial risk assessments, and criminal justice, the imperative to ensure fairness has intensified due to potential negative implications. While much ML fairness research has focused on enhancing training data and processes, addressing the outputs of already deployed systems has received less attention. This paper introduces 'BiasGuard', a novel approach designed to act as a fairness guardrail in production ML systems. BiasGuard leverages Test-Time Augmentation (TTA) powered by Conditional Generative Adversarial Network (CTGAN), a cutting-edge generative AI model, to synthesize data samples conditioned on inverted protected attribute values, thereby promoting equitable outcomes across diverse groups. This method aims to provide equal opportunities for both privileged and unprivileged groups while significantly enhancing the fairness metrics of deployed systems without the need for retraining. Our comprehensive experimental analysis across diverse datasets reveals that BiasGuard enhances fairness by 31% while only reducing accuracy by 0.09% compared to non-mitigated benchmarks. Additionally, BiasGuard outperforms existing post-processing methods in improving fairness, positioning it as an effective tool to safeguard against biases when retraining the model is impractical.

BiasGuard: Guardrailing Fairness in Machine Learning Production Systems

TL;DR

synthetic samples to balance outcomes, aggregating predictions to reduce disparities. Across five datasets, BiasGuard delivers a substantial

improvement (≈31%) with only a minor average accuracy loss (≈0.09%), outperforming Threshold Optimizer and Reject Option in fairness with less detrimental accuracy trade-offs. This approach provides a model-agnostic, deployment-friendly mechanism to safeguard fairness in dynamic production settings without retraining, though it introduces inference-time overhead that is mitigated by configurable augmentation levels and hardware acceleration.

Abstract

Paper Structure (29 sections, 3 equations, 3 figures, 2 tables, 1 algorithm)

This paper contains 29 sections, 3 equations, 3 figures, 2 tables, 1 algorithm.

Introduction
Background and Related Work
Bias Detection Metrics
Bias Mitigation Methods
Test-Time Augmentation
GAN-based Augmentation for Tabular Data
Method
Formulation
Why BiasGuard Improves Fairness
Time Complexity
Experiments
Evaluation Metrics
Compared Methods
Implementation Details
Datasets
...and 14 more sections

Figures (3)

Figure 1: BiasGuard motivation - For every sample $x^{(i)}$ from the test set, synthetic data is generated with the opposite value of its protected attribute as a condition. Then, the prediction is balanced with the nearest samples.
Figure 2: An overview of the BiasGuard method. 1 - For every sample $x^{(i)}$ from the test set. 2 - A TTA of synthetic data based on CTGAN is chosen with the opposite protected value of $x^{(i)}$. 3 - TTA predictions are received from the black-box model. 4 - Generation of the final prediction $\hat{y}_{final}$ by aggregating the instance prediction with all the augmentation predictions.
Figure 3: Tradeoff between fairness (represented by EoD) and accuracy.

BiasGuard: Guardrailing Fairness in Machine Learning Production Systems

TL;DR

Abstract

BiasGuard: Guardrailing Fairness in Machine Learning Production Systems

Authors

TL;DR

Abstract

Table of Contents

Figures (3)