Post-processing fairness with minimal changes
Federico Di Gennaro, Thibault Laugel, Vincent Grari, Xavier Renard, Marcin Detyniecki
TL;DR
The paper tackles the challenge of achieving fairness in predictive models via post-processing without requiring sensitive attributes at test time. It introduces Ratio-Based Model Debiasing (RBMD), a model-agnostic approach that multiplies the logit of a biased classifier by a learned ratio, producing a corrected score g(X) = σ(r(X) f_logit(X)). The method combines a ratio penalty to minimize changes with an adversarial objective to enforce Demographic Parity, yielding competitive accuracy-fairness trade-offs on Law School and COMPAS while altering fewer predictions than many baselines. This makes RBMD practically appealing for production systems where preserving validated predictions is crucial and sensitive attributes are unavailable during inference. The work also demonstrates interpretability benefits through surrogate models that explain which instances are targeted by debiasing and how changes propagate in the feature space.
Abstract
In this paper, we introduce a novel post-processing algorithm that is both model-agnostic and does not require the sensitive attribute at test time. In addition, our algorithm is explicitly designed to enforce minimal changes between biased and debiased predictions; a property that, while highly desirable, is rarely prioritized as an explicit objective in fairness literature. Our approach leverages a multiplicative factor applied to the logit value of probability scores produced by a black-box classifier. We demonstrate the efficacy of our method through empirical evaluations, comparing its performance against other four debiasing algorithms on two widely used datasets in fairness research.
