fairret: a Framework for Differentiable Fairness Regularization Terms

Maarten Buyl; MaryBeth Defrance; Tijl De Bie

fairret: a Framework for Differentiable Fairness Regularization Terms

Maarten Buyl, MaryBeth Defrance, Tijl De Bie

TL;DR

The paper tackles integrating fairness into differentiable ML pipelines by introducing fairrets, a modular framework of differentiable fairness regularization terms founded on linear-fractional statistics. It presents two archetypes—violation FAIRRETs and projection FAIRRETs—that enable both direct constraint penalties and distributional projections onto fair sets, respectively—while handling continuous sensitive values and multiple axes of discrimination. The approach yields strict, differentiable regularizers easily integrated with PyTorch, and the authors provide a PyTorch implementation and empirical evaluation across several real-world datasets. Key findings indicate projection FAIRRETs often yield better fairness-performance trade-offs than violation-based methods, particularly for linear statistics, though linear-fractional notions like PP and TE remain challenging. The framework offers a flexible, extensible path toward broader, differentiable fairness definitions in practical ML systems.

Abstract

Current fairness toolkits in machine learning only admit a limited range of fairness definitions and have seen little integration with automatic differentiation libraries, despite the central role these libraries play in modern machine learning pipelines. We introduce a framework of fairness regularization terms (fairrets) which quantify bias as modular, flexible objectives that are easily integrated in automatic differentiation pipelines. By employing a general definition of fairness in terms of linear-fractional statistics, a wide class of fairrets can be computed efficiently. Experiments show the behavior of their gradients and their utility in enforcing fairness with minimal loss of predictive power compared to baselines. Our contribution includes a PyTorch implementation of the fairret framework.

fairret: a Framework for Differentiable Fairness Regularization Terms

TL;DR

Abstract

Paper Structure (44 sections, 3 theorems, 30 equations, 9 figures, 1 table)

This paper contains 44 sections, 3 theorems, 30 equations, 9 figures, 1 table.

Introduction
Contributions
Related Work
Fairness in Binary Classification
Partition Fairness
Beyond Partition Fairness
Continuous Sensitive Values
Multiple Axes of Discrimination
Fairness Regularization Terms
Violation FAIRRETs
Projection FAIRRETs
Analysis
Experiments
Setup
Results
...and 29 more sections

Key Result

Proposition 1

With $\gamma \in \Gamma$, the $c$-fixed fairness notion $\mathcal{F}_\gamma(c)$ enforces linear constraints: where $\alpha(\mathbf{X}, Y, c) = \alpha_0(\mathbf{X}, Y) - c \alpha_1(\mathbf{X}, Y)$ and $\beta(\mathbf{X}, Y, c) = \beta_0(\mathbf{X}, Y) - c \beta_1(\mathbf{X}, Y)$.

Figures (9)

Figure 1: The model $h$ was trained on the ACSIncome dataset without fairret (i.e. $\lambda = 0$) and ends up with disparate positive rates $\gamma(0; h) > \overline{\gamma}(h) > \gamma(1; h)$ for the one-hot encoded sensitive variables $(S_0, S_1)$. These should be brought closer to the overall positive rate $\overline{\gamma}(f)$. We show probability scores $h$ and the gradientsnote:grad_h of several fairrets $R_\gamma$ with respect to $h$. The gradients are normalized by dividing them by their maximum absolute value per fairret and per group. They are positive for samples with $S_0 = 1$, implying their scores should decrease, and vice versa for $S_1 = 1$.
Figure 2: Mean test set results with confidence ellipse for the standard error. Each marker is a separate combination of dataset, fairret, fairret strength, and statistic. Results in the lower right are optimal. Failed runs (with an AUROC far worse than the rest) are omitted.
Figure C.1: (left) Test set DP violation, with a similar experiment setup as Fig. \ref{['fig:results']} on the ACSIncome dataset. Each result bar results from a separate training run with the $D_\textnormal{KL}$-projection fairret that was minimizing the DP violation. The configurations only differ in the maximum number of iterations used in the convex optimizations that compute the actual $D_\textnormal{KL}$-projections $f^*$ (see Sec. \ref{['sec:projection']}). (right) The total training time of these runs with standard error.
Figure C.2: Starting from the same setup as in Fig. \ref{['fig:grads']}, we show the probability scores of both $h$ (full line) and the projected distributions $f^*$ of each projection fairret (dotted lines). The $y$-axis shows the KDE densities of these scores, all on the same scale.
Figure C.3: Test set SmoothMax loss $R_\gamma(h)$ with $\gamma$ the positive rate statistic (enforcing the DP notion), computed for an unfair model $h$ trained with the same setup as in Fig. \ref{['fig:grads']}. Each loss is computed over the entire test dataset, but chunked using different batch sizes. For smaller batch sizes, the mean SmoothMax loss is an overestimate of the actual SmoothMax loss computed over all 39 133 samples.
...and 4 more figures

Theorems & Definitions (26)

Definition 1
Example 1
Definition 2
Definition 3
Example 2
Example 3
Example 4
Definition 4
Definition 5
Proposition 1
...and 16 more

fairret: a Framework for Differentiable Fairness Regularization Terms

TL;DR

Abstract

fairret: a Framework for Differentiable Fairness Regularization Terms

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (9)

Theorems & Definitions (26)