Table of Contents
Fetching ...

Fair Supervised Learning Through Constraints on Smooth Nonconvex Unfairness-Measure Surrogates

Zahra Khatti, Daniel P. Robinson, Frank E. Curtis

TL;DR

This paper tackles fairness in supervised learning by shifting from regularization-based approaches to hard constraints that enforce specified unfairness bounds. It introduces smooth, bounded, nonconvex surrogates to approximate discontinuous unfairness measures and proves that small surrogate values imply small actual unfairness, with a bound that tightens as the surrogate is scaled. The proposed training formulation minimizes standard loss plus a regulator under hard constraint bounds and is solved efficiently via sequential quadratic programming, enabling simultaneous enforcement of multiple unfairness criteria. Empirical results on Dutch, Law, and ACSIncome datasets demonstrate tight control over disparate impact and related measures with minimal sacrifice in predictive accuracy, while highlighting the practical benefits and costs of constraint-based training over regularization-based methods.

Abstract

A new strategy for fair supervised machine learning is proposed. The main advantages of the proposed strategy as compared to others in the literature are as follows. (a) We introduce a new smooth nonconvex surrogate to approximate the Heaviside functions involved in discontinuous unfairness measures. The surrogate is based on smoothing methods from the optimization literature, and is new for the fair supervised learning literature. The surrogate is a tight approximation which ensures the trained prediction models are fair, as opposed to other (e.g., convex) surrogates that can fail to lead to a fair prediction model in practice. (b) Rather than rely on regularizers (that lead to optimization problems that are difficult to solve) and corresponding regularization parameters (that can be expensive to tune), we propose a strategy that employs hard constraints so that specific tolerances for unfairness can be enforced without the complications associated with the use of regularization. (c) Our proposed strategy readily allows for constraints on multiple (potentially conflicting) unfairness measures at the same time. Multiple measures can be considered with a regularization approach, but at the cost of having even more difficult optimization problems to solve and further expense for tuning. By contrast, through hard constraints, our strategy leads to optimization models that can be solved tractably with minimal tuning.

Fair Supervised Learning Through Constraints on Smooth Nonconvex Unfairness-Measure Surrogates

TL;DR

This paper tackles fairness in supervised learning by shifting from regularization-based approaches to hard constraints that enforce specified unfairness bounds. It introduces smooth, bounded, nonconvex surrogates to approximate discontinuous unfairness measures and proves that small surrogate values imply small actual unfairness, with a bound that tightens as the surrogate is scaled. The proposed training formulation minimizes standard loss plus a regulator under hard constraint bounds and is solved efficiently via sequential quadratic programming, enabling simultaneous enforcement of multiple unfairness criteria. Empirical results on Dutch, Law, and ACSIncome datasets demonstrate tight control over disparate impact and related measures with minimal sacrifice in predictive accuracy, while highlighting the practical benefits and costs of constraint-based training over regularization-based methods.

Abstract

A new strategy for fair supervised machine learning is proposed. The main advantages of the proposed strategy as compared to others in the literature are as follows. (a) We introduce a new smooth nonconvex surrogate to approximate the Heaviside functions involved in discontinuous unfairness measures. The surrogate is based on smoothing methods from the optimization literature, and is new for the fair supervised learning literature. The surrogate is a tight approximation which ensures the trained prediction models are fair, as opposed to other (e.g., convex) surrogates that can fail to lead to a fair prediction model in practice. (b) Rather than rely on regularizers (that lead to optimization problems that are difficult to solve) and corresponding regularization parameters (that can be expensive to tune), we propose a strategy that employs hard constraints so that specific tolerances for unfairness can be enforced without the complications associated with the use of regularization. (c) Our proposed strategy readily allows for constraints on multiple (potentially conflicting) unfairness measures at the same time. Multiple measures can be considered with a regularization approach, but at the cost of having even more difficult optimization problems to solve and further expense for tuning. By contrast, through hard constraints, our strategy leads to optimization models that can be solved tractably with minimal tuning.

Paper Structure

This paper contains 26 sections, 3 theorems, 37 equations, 12 figures, 4 tables, 2 algorithms.

Key Result

Theorem 2.1

(derived from yao2023understanding) Suppose $\phi : \mathbb{R} \to [0,1]$, the function $\phi - \tfrac{1}{2}$ is symmetric about the origin, and for some $\gamma \in (0,\tfrac{1}{2})$ one has $\phi(t(x_i,s_i,w)) \in [0,\gamma] \cup [1-\gamma,1]$ for all $i \in [N]$. Suppose also that, for some $\eps Then the actual empirical estimate of the violation of demographic parity has $|\bar{c}_{\textrm{dp

Figures (12)

  • Figure 1: On the left, graphs of the step/Heaviside function ($\mathbf{1}\{t\geq 0\}$), linear function ($\phi(t) = t$), sigmoid function ($\phi(t) = \sigma(t)$), and smoothed-step function ($\phi_\mu(t)$ defined in this section). On the right, graphs of scaled functions ($\sigma(10t)$ and $\phi_\mu(10t)$) to illustrate that scaling can make the functions more closely approximate the step function.
  • Figure 2: Levels of disparate impact actually achieved ($\hat{\delta}$) when prediction models are trained with constraints as in \ref{['eq:di']} for varying values of $\delta$without scaling of the surrogate functions. The surrogate approximations not being tight causes large gaps between the levels of disparate impact desired and the levels actually achieved. The graphs indicated by $\phi(t)$ show the values such that $c_{\textrm{di}}(w) \leq 0$, whereas the graphs indicated by $\hat{y}$ show the values such that $\bar{c}_{\textrm{di}}(w) \leq 0$.
  • Figure 3: Levels of disparate impact actually achieved ($\hat{\delta}$) when prediction models are trained with constraints as in \ref{['eq:di']} for varying values of $\delta$with scaling of the surrogate functions. These results should be contrasted with those in Figure \ref{['fig:delta_gap']}. In particular, it should be observed that scaling the surrogate functions leads to much tighter correspondence between $\delta$ and $\hat{\delta}$ when the constraints are tight. Also, the plot on the right is the levels of disparate impact actually achieved when a constraint on a covariance surrogate (less than or equal to $\epsilon$) is imposed.
  • Figure 4: Training accuracy and constraint violation measures for smoothed-step, sigmoid, and covariance models on the Law dataset. The results indicate that fairness constraints can be satisfied without compromising prediction accuracy.
  • Figure 5: Comparison between the constraint-only (left) and regularization-only (right) approaches for enforcing fairness using the smoothed-step surrogate on ACSIncome dataset. The constraint-only-based approach consistently meets the targeted threshold ($\delta$) with minimal impact on accuracy, while the regularization-only-based method exhibits unpredictable outcomes and greater accuracy loss, underscoring an advantage of using explicit constraints.
  • ...and 7 more figures

Theorems & Definitions (3)

  • Theorem 2.1
  • Theorem 2.2
  • Theorem 3.1