General Post-Processing Framework for Fairness Adjustment of Machine Learning Models
Léandre Eberhard, Nirek Sharma, Filipp Shelobolin, Aalok Ganesh Shanbhag
TL;DR
This work addresses the tension between predictive performance and fairness constraints by introducing a post-processing fairness adjuster that offsets baseline predictions with a learned adjustment $g(X)$. The method separates fairness optimization from the training objective, enabling use with black-box models and data-driven fairness tuning without altering the original model or labels used for training. Theoretical analysis provides bounds comparing the adjuster to joint in-processing, including equivalence in linear regression and cross-entropy-like bounds for classification, while experiments on Adult, German, and COMPAS datasets demonstrate a near-identical fairness-accuracy tradeoff to adversarial debiasing with practical benefits in flexibility and interpretability. Overall, the framework offers a robust, adaptable, and auditable pathway to post-hoc fairness adjustments suitable for diverse models and datasets.
Abstract
As machine learning increasingly influences critical domains such as credit underwriting, public policy, and talent acquisition, ensuring compliance with fairness constraints is both a legal and ethical imperative. This paper introduces a novel framework for fairness adjustments that applies to diverse machine learning tasks, including regression and classification, and accommodates a wide range of fairness metrics. Unlike traditional approaches categorized as pre-processing, in-processing, or post-processing, our method adapts in-processing techniques for use as a post-processing step. By decoupling fairness adjustments from the model training process, our framework preserves model performance on average while enabling greater flexibility in model development. Key advantages include eliminating the need for custom loss functions, enabling fairness tuning using different datasets, accommodating proprietary models as black-box systems, and providing interpretable insights into the fairness adjustments. We demonstrate the effectiveness of this approach by comparing it to Adversarial Debiasing, showing that our framework achieves a comparable fairness/accuracy tradeoff on real-world datasets.
