Near-Optimal Solutions of Constrained Learning Problems
Juan Elenter, Luiz F. O. Chamon, Alejandro Ribeiro
TL;DR
The paper tackles constrained learning in non-convex settings and shows that dual ascent, when paired with sufficiently rich parametrizations, yields near-feasible and near-optimal solutions without needing randomization across iterates. It derives quantitative bounds that relate the distance to feasibility and optimality to the parametrization richness $\nu$ and the curvature properties of the unparametrized dual, $\mu_g$ and $\beta_g$, highlighting a trade-off governed by problem conditioning. A key contribution is the near-universality framework connecting parametrized and unparametrized problems, with a detailed proof sketch separating dual-variable perturbations from hypothesis-class perturbations. Experimental validation on counterfactual fairness tasks (e.g., COMPAS) demonstrates that last iterates can match randomized baselines in accuracy and fairness constraints, especially as model capacity grows, thereby bridging theory and practice in constrained learning.
Abstract
With the widespread adoption of machine learning systems, the need to curtail their behavior has become increasingly apparent. This is evidenced by recent advancements towards developing models that satisfy robustness, safety, and fairness requirements. These requirements can be imposed (with generalization guarantees) by formulating constrained learning problems that can then be tackled by dual ascent algorithms. Yet, though these algorithms converge in objective value, even in non-convex settings, they cannot guarantee that their outcome is feasible. Doing so requires randomizing over all iterates, which is impractical in virtually any modern applications. Still, final iterates have been observed to perform well in practice. In this work, we address this gap between theory and practice by characterizing the constraint violation of Lagrangian minimizers associated with optimal dual variables, despite lack of convexity. To do this, we leverage the fact that non-convex, finite-dimensional constrained learning problems can be seen as parametrizations of convex, functional problems. Our results show that rich parametrizations effectively mitigate the issue of feasibility in dual methods, shedding light on prior empirical successes of dual learning. We illustrate our findings in fair learning tasks.
