BiasGuard: Guardrailing Fairness in Machine Learning Production Systems
Nurit Cohen-Inger, Seffi Cohen, Neomi Rabaev, Lior Rokach, Bracha Shapira
TL;DR
BiasGuard tackles fairness in production ML by introducing a post-processing guardrail that uses Test-Time Augmentation with CTGAN to generate synthetic samples conditioned on inverse protected attributes. The method detects potential bias at inference by comparing original and opposite-protected predictions and, when needed, augments with $\mathcal{T}$ synthetic samples to balance outcomes, aggregating predictions to reduce disparities. Across five datasets, BiasGuard delivers a substantial $EOD$ improvement (≈31%) with only a minor average accuracy loss (≈0.09%), outperforming Threshold Optimizer and Reject Option in fairness with less detrimental accuracy trade-offs. This approach provides a model-agnostic, deployment-friendly mechanism to safeguard fairness in dynamic production settings without retraining, though it introduces inference-time overhead that is mitigated by configurable augmentation levels and hardware acceleration.
Abstract
As machine learning (ML) systems increasingly impact critical sectors such as hiring, financial risk assessments, and criminal justice, the imperative to ensure fairness has intensified due to potential negative implications. While much ML fairness research has focused on enhancing training data and processes, addressing the outputs of already deployed systems has received less attention. This paper introduces 'BiasGuard', a novel approach designed to act as a fairness guardrail in production ML systems. BiasGuard leverages Test-Time Augmentation (TTA) powered by Conditional Generative Adversarial Network (CTGAN), a cutting-edge generative AI model, to synthesize data samples conditioned on inverted protected attribute values, thereby promoting equitable outcomes across diverse groups. This method aims to provide equal opportunities for both privileged and unprivileged groups while significantly enhancing the fairness metrics of deployed systems without the need for retraining. Our comprehensive experimental analysis across diverse datasets reveals that BiasGuard enhances fairness by 31% while only reducing accuracy by 0.09% compared to non-mitigated benchmarks. Additionally, BiasGuard outperforms existing post-processing methods in improving fairness, positioning it as an effective tool to safeguard against biases when retraining the model is impractical.
