Post-processing fairness with minimal changes

Federico Di Gennaro; Thibault Laugel; Vincent Grari; Xavier Renard; Marcin Detyniecki

Post-processing fairness with minimal changes

Federico Di Gennaro, Thibault Laugel, Vincent Grari, Xavier Renard, Marcin Detyniecki

TL;DR

The paper tackles the challenge of achieving fairness in predictive models via post-processing without requiring sensitive attributes at test time. It introduces Ratio-Based Model Debiasing (RBMD), a model-agnostic approach that multiplies the logit of a biased classifier by a learned ratio, producing a corrected score g(X) = σ(r(X) f_logit(X)). The method combines a ratio penalty to minimize changes with an adversarial objective to enforce Demographic Parity, yielding competitive accuracy-fairness trade-offs on Law School and COMPAS while altering fewer predictions than many baselines. This makes RBMD practically appealing for production systems where preserving validated predictions is crucial and sensitive attributes are unavailable during inference. The work also demonstrates interpretability benefits through surrogate models that explain which instances are targeted by debiasing and how changes propagate in the feature space.

Abstract

In this paper, we introduce a novel post-processing algorithm that is both model-agnostic and does not require the sensitive attribute at test time. In addition, our algorithm is explicitly designed to enforce minimal changes between biased and debiased predictions; a property that, while highly desirable, is rarely prioritized as an explicit objective in fairness literature. Our approach leverages a multiplicative factor applied to the logit value of probability scores produced by a black-box classifier. We demonstrate the efficacy of our method through empirical evaluations, comparing its performance against other four debiasing algorithms on two widely used datasets in fairness research.

Post-processing fairness with minimal changes

TL;DR

Abstract

Paper Structure (23 sections, 6 equations, 9 figures, 5 tables)

This paper contains 23 sections, 6 equations, 9 figures, 5 tables.

Introduction and Context
Proposition
Post-processing as a supervised learning task
Ratio-Based Model Debiasing
Experiments
Experimental Setting
Results
Experiment 1: Accuracy and Fairness
Experiment 2: Number of changes
Experiment 3: Explaining prediction changes
Conclusion
Implementation details
Architecture of $r$ and Interpretability
Calibration
Loss and Hyperparameters
...and 8 more sections

Figures (9)

Figure 1: Fairness vs Accuracy trade-off on Law School (left) and COMPAS (right) datasets.
Figure 2: Mean $\mathcal{P}$ ($\pm \text{std}$) performed by all methods. "NaN" means that no model falls into this range of scores. Only the cells where RBMD had at least 2 model runs are kept.
Figure 3: F1-score of CART for different depths for similar values of fairness (Q3) and accuracy (Q2). There is no point for ROC in this cell, so it is not included in the plot.
Figure 4: Fairness vs Accuracy trade-off on Law School for different ratio architectures. Each dot corresponds to one model run.
Figure 5: Weights $w_1,...,w_d$ of Equation \ref{['eq:linear ratio']} when the ratio architecture is linear.
...and 4 more figures

Post-processing fairness with minimal changes

TL;DR

Abstract

Post-processing fairness with minimal changes

Authors

TL;DR

Abstract

Table of Contents

Figures (9)