fairret: a Framework for Differentiable Fairness Regularization Terms
Maarten Buyl, MaryBeth Defrance, Tijl De Bie
TL;DR
The paper tackles integrating fairness into differentiable ML pipelines by introducing fairrets, a modular framework of differentiable fairness regularization terms founded on linear-fractional statistics. It presents two archetypes—violation FAIRRETs and projection FAIRRETs—that enable both direct constraint penalties and distributional projections onto fair sets, respectively—while handling continuous sensitive values and multiple axes of discrimination. The approach yields strict, differentiable regularizers easily integrated with PyTorch, and the authors provide a PyTorch implementation and empirical evaluation across several real-world datasets. Key findings indicate projection FAIRRETs often yield better fairness-performance trade-offs than violation-based methods, particularly for linear statistics, though linear-fractional notions like PP and TE remain challenging. The framework offers a flexible, extensible path toward broader, differentiable fairness definitions in practical ML systems.
Abstract
Current fairness toolkits in machine learning only admit a limited range of fairness definitions and have seen little integration with automatic differentiation libraries, despite the central role these libraries play in modern machine learning pipelines. We introduce a framework of fairness regularization terms (fairrets) which quantify bias as modular, flexible objectives that are easily integrated in automatic differentiation pipelines. By employing a general definition of fairness in terms of linear-fractional statistics, a wide class of fairrets can be computed efficiently. Experiments show the behavior of their gradients and their utility in enforcing fairness with minimal loss of predictive power compared to baselines. Our contribution includes a PyTorch implementation of the fairret framework.
