Table of Contents
Fetching ...

Unprocessing Seven Years of Algorithmic Fairness

André F. Cruz, Moritz Hardt

TL;DR

The paper investigates whether postprocessing remains the most effective way to enforce error-rate parity across demographic groups. It introduces unprocessing, the inverse mapping of postprocessing, to enable fair comparisons across methods and constraint relaxations, and it presents an LP-based approach to relaxed equalized odds with open-source tooling. Across thousands of models on tabular ACS datasets, postprocessing of the most accurate unconstrained predictor consistently matches or dominates all examined fairness interventions. The work underscores the importance of rigorous empirical evaluation in fairness research and provides practical tools to compare fairness methods on an equal footing.

Abstract

Seven years ago, researchers proposed a postprocessing method to equalize the error rates of a model across different demographic groups. The work launched hundreds of papers purporting to improve over the postprocessing baseline. We empirically evaluate these claims through thousands of model evaluations on several tabular datasets. We find that the fairness-accuracy Pareto frontier achieved by postprocessing contains all other methods we were feasibly able to evaluate. In doing so, we address two common methodological errors that have confounded previous observations. One relates to the comparison of methods with different unconstrained base models. The other concerns methods achieving different levels of constraint relaxation. At the heart of our study is a simple idea we call unprocessing that roughly corresponds to the inverse of postprocessing. Unprocessing allows for a direct comparison of methods using different underlying models and levels of relaxation.

Unprocessing Seven Years of Algorithmic Fairness

TL;DR

The paper investigates whether postprocessing remains the most effective way to enforce error-rate parity across demographic groups. It introduces unprocessing, the inverse mapping of postprocessing, to enable fair comparisons across methods and constraint relaxations, and it presents an LP-based approach to relaxed equalized odds with open-source tooling. Across thousands of models on tabular ACS datasets, postprocessing of the most accurate unconstrained predictor consistently matches or dominates all examined fairness interventions. The work underscores the importance of rigorous empirical evaluation in fairness research and provides practical tools to compare fairness methods on an equal footing.

Abstract

Seven years ago, researchers proposed a postprocessing method to equalize the error rates of a model across different demographic groups. The work launched hundreds of papers purporting to improve over the postprocessing baseline. We empirically evaluate these claims through thousands of model evaluations on several tabular datasets. We find that the fairness-accuracy Pareto frontier achieved by postprocessing contains all other methods we were feasibly able to evaluate. In doing so, we address two common methodological errors that have confounded previous observations. One relates to the comparison of methods with different unconstrained base models. The other concerns methods achieving different levels of constraint relaxation. At the heart of our study is a simple idea we call unprocessing that roughly corresponds to the inverse of postprocessing. Unprocessing allows for a direct comparison of methods using different underlying models and levels of relaxation.
Paper Structure (23 sections, 15 equations, 26 figures)

This paper contains 23 sections, 15 equations, 26 figures.

Figures (26)

  • Figure 1: Test accuracy and constraint violation for 1000 models trained on the ACSIncome dataset ding2021retiring, corresponding to a variety of preprocessing or inprocessing methods, as well as unconstrained learners. A red line shows the postprocessing Pareto frontier of the single model with highest accuracy (a GBM model).
  • Figure 2: Example illustrating unprocessing. Left: Initial unconstrained model $A$ postprocessed to $\tilde{A}$. Some contender model $B$ incomparable to $\tilde{A}$. Middle: We unprocess $B$ to get a new model $B^*$. Right: Postprocessing $B^*$ to the same constraint level as $B$ or $\tilde{A}$.
  • Figure 3: Real-data version of the illustrative plot shown in Figure \ref{['fig:unprocessing_illustration']} (results shown on ACSIncome test, models selected on validation). $A$ and $B$ are two arbitrary incomparable models (both Pareto-efficient), which are made comparable after unprocessing. Left: original (unaltered) results; Middle: results after unprocessing all models; Right: original (unaltered) results, together with the postprocessing curve for both $A^*$ and $B^*$. Additional model pairs shown in Appendix \ref{['app:two_postproc_curves']}.
  • Figure 4: Pareto frontier attained by each GBM-based algorithm, together with the Pareto frontier attained by postprocessing the GBM-based model with highest unprocessed validation accuracy, $m^*$. Results for remaining ACS datasets shown in Figure \ref{['fig:pareto_frontiers_gbm_3datasets']}.
  • Figure 5: Mean time to fit GBM and GBM-based preprocessing and inprocessing algorithms on the ACSIncome (left plot) and ACSPublicCoverage (right plot) datasets, with $95\%$ confidence intervals. The time taken to run postprocessing is also shown for each algorithm as a stacked dark bar. Note the log scale: the EG inprocessing method takes one order of magnitude longer to fit than the base GBM.
  • ...and 21 more figures

Theorems & Definitions (1)

  • proof