Fair and Accurate Regression: Strong Formulations and Algorithms
Anna Deza, Andrés Gómez, Alper Atamtürk
TL;DR
The paper tackles training regression models under exact fairness constraints by formulating the problem as a mixed-integer optimization task that enforces demographic parity via a discretized DP measure. It introduces a strong convex relaxation built on an extended convexification of the subproblems, providing an exact solution for single-observation and single-factor cases and offering three practical approaches: convex relaxation, convex relaxation with coordinate descent, and full MIO with branch-and-bound. Through extensive synthetic and real-data experiments in least-squares and logistic regression, the methods demonstrate competitive accuracy while markedly reducing training times, with coordinate-descent and relaxed approaches offering scalable performance. The work advances fair regression by delivering rigorous, exact formulations alongside scalable heuristics, enabling principled fairness-accuracy trade-offs in practical applications and paving the way for extensions to other generalized linear models and fairness notions.
Abstract
This paper introduces mixed-integer optimization methods to solve regression problems that incorporate fairness metrics. We propose an exact formulation for training fair regression models. To tackle this computationally hard problem, we study the polynomially-solvable single-factor and single-observation subproblems as building blocks and derive their closed convex hull descriptions. Strong formulations obtained for the general fair regression problem in this manner are utilized to solve the problem with a branch-and-bound algorithm exactly or as a relaxation to produce fair and accurate models rapidly. Moreover, to handle large-scale instances, we develop a coordinate descent algorithm motivated by the convex-hull representation of the single-factor fair regression problem to improve a given solution efficiently. Numerical experiments conducted on fair least squares and fair logistic regression problems show competitive statistical performance with state-of-the-art methods while significantly reducing training times.
