Table of Contents
Fetching ...

Fair and Accurate Regression: Strong Formulations and Algorithms

Anna Deza, Andrés Gómez, Alper Atamtürk

TL;DR

The paper tackles training regression models under exact fairness constraints by formulating the problem as a mixed-integer optimization task that enforces demographic parity via a discretized DP measure. It introduces a strong convex relaxation built on an extended convexification of the subproblems, providing an exact solution for single-observation and single-factor cases and offering three practical approaches: convex relaxation, convex relaxation with coordinate descent, and full MIO with branch-and-bound. Through extensive synthetic and real-data experiments in least-squares and logistic regression, the methods demonstrate competitive accuracy while markedly reducing training times, with coordinate-descent and relaxed approaches offering scalable performance. The work advances fair regression by delivering rigorous, exact formulations alongside scalable heuristics, enabling principled fairness-accuracy trade-offs in practical applications and paving the way for extensions to other generalized linear models and fairness notions.

Abstract

This paper introduces mixed-integer optimization methods to solve regression problems that incorporate fairness metrics. We propose an exact formulation for training fair regression models. To tackle this computationally hard problem, we study the polynomially-solvable single-factor and single-observation subproblems as building blocks and derive their closed convex hull descriptions. Strong formulations obtained for the general fair regression problem in this manner are utilized to solve the problem with a branch-and-bound algorithm exactly or as a relaxation to produce fair and accurate models rapidly. Moreover, to handle large-scale instances, we develop a coordinate descent algorithm motivated by the convex-hull representation of the single-factor fair regression problem to improve a given solution efficiently. Numerical experiments conducted on fair least squares and fair logistic regression problems show competitive statistical performance with state-of-the-art methods while significantly reducing training times.

Fair and Accurate Regression: Strong Formulations and Algorithms

TL;DR

The paper tackles training regression models under exact fairness constraints by formulating the problem as a mixed-integer optimization task that enforces demographic parity via a discretized DP measure. It introduces a strong convex relaxation built on an extended convexification of the subproblems, providing an exact solution for single-observation and single-factor cases and offering three practical approaches: convex relaxation, convex relaxation with coordinate descent, and full MIO with branch-and-bound. Through extensive synthetic and real-data experiments in least-squares and logistic regression, the methods demonstrate competitive accuracy while markedly reducing training times, with coordinate-descent and relaxed approaches offering scalable performance. The work advances fair regression by delivering rigorous, exact formulations alongside scalable heuristics, enabling principled fairness-accuracy trade-offs in practical applications and paving the way for extensions to other generalized linear models and fairness notions.

Abstract

This paper introduces mixed-integer optimization methods to solve regression problems that incorporate fairness metrics. We propose an exact formulation for training fair regression models. To tackle this computationally hard problem, we study the polynomially-solvable single-factor and single-observation subproblems as building blocks and derive their closed convex hull descriptions. Strong formulations obtained for the general fair regression problem in this manner are utilized to solve the problem with a branch-and-bound algorithm exactly or as a relaxation to produce fair and accurate models rapidly. Moreover, to handle large-scale instances, we develop a coordinate descent algorithm motivated by the convex-hull representation of the single-factor fair regression problem to improve a given solution efficiently. Numerical experiments conducted on fair least squares and fair logistic regression problems show competitive statistical performance with state-of-the-art methods while significantly reducing training times.

Paper Structure

This paper contains 37 sections, 6 theorems, 46 equations, 8 figures, 3 tables, 1 algorithm.

Key Result

Proposition 1

$\mathrm{cl}(V)=\bar{V}$.

Figures (8)

  • Figure 1: Visual representation of the extended formulation of $X_i$. We introduce variables $p_j, j = 0, \ldots, \ell$, which model the piece of the prediction $v$ on the sub-intervals defined by $b_1, \ldots, b_\ell$.
  • Figure 2: Regularized fair logistic regression objective for the true problem, strong convex relaxation \ref{['mod.strong.formulation']}, and convex approximations of zafar2017fairnessB and wu2019convexity for varying regularization weights ($\lambda$).
  • Figure 3: Accuracy (MSE) vs. fairness $\left(\widehat{\mathrm{DP}}_{41}\right)$ on synthetic data. MICQO achieves the best trade-off between accuracy and fairness but requires considerable computation time, often hitting the imposed one-hour time limit. On the other hand, CD-relax produces solutions that approach the quality of those attained by MICQO within seconds.
  • Figure 4: Accuracy-fairness trade-off curves obtained by models trained using FR-Reduction and Relax on the Communities & Crime dataset. Each curve represents the mean over 10 trials, with a 95% confidence interval band on the relative RMSE. Relax and regularized variants produce least squares regression models that are competitive with or outperform the state-of-the-art in terms of out-of-sample performance with over $30\times$ improvement in runtimes.
  • Figure 5: Accuracy-fairness trade-off curves obtained by models trained using FR-Reduction and Relax and CD-relax on the sub-sampled and full Law School dataset.
  • ...and 3 more figures

Theorems & Definitions (13)

  • Proposition 1
  • proof
  • Proposition 2
  • proof
  • Remark 1
  • Proposition 3: Validity
  • proof
  • Corollary 1: Conic relaxation for fair regression
  • Remark 2
  • Proposition 4
  • ...and 3 more