Table of Contents
Fetching ...

Near-Optimal Solutions of Constrained Learning Problems

Juan Elenter, Luiz F. O. Chamon, Alejandro Ribeiro

TL;DR

The paper tackles constrained learning in non-convex settings and shows that dual ascent, when paired with sufficiently rich parametrizations, yields near-feasible and near-optimal solutions without needing randomization across iterates. It derives quantitative bounds that relate the distance to feasibility and optimality to the parametrization richness $\nu$ and the curvature properties of the unparametrized dual, $\mu_g$ and $\beta_g$, highlighting a trade-off governed by problem conditioning. A key contribution is the near-universality framework connecting parametrized and unparametrized problems, with a detailed proof sketch separating dual-variable perturbations from hypothesis-class perturbations. Experimental validation on counterfactual fairness tasks (e.g., COMPAS) demonstrates that last iterates can match randomized baselines in accuracy and fairness constraints, especially as model capacity grows, thereby bridging theory and practice in constrained learning.

Abstract

With the widespread adoption of machine learning systems, the need to curtail their behavior has become increasingly apparent. This is evidenced by recent advancements towards developing models that satisfy robustness, safety, and fairness requirements. These requirements can be imposed (with generalization guarantees) by formulating constrained learning problems that can then be tackled by dual ascent algorithms. Yet, though these algorithms converge in objective value, even in non-convex settings, they cannot guarantee that their outcome is feasible. Doing so requires randomizing over all iterates, which is impractical in virtually any modern applications. Still, final iterates have been observed to perform well in practice. In this work, we address this gap between theory and practice by characterizing the constraint violation of Lagrangian minimizers associated with optimal dual variables, despite lack of convexity. To do this, we leverage the fact that non-convex, finite-dimensional constrained learning problems can be seen as parametrizations of convex, functional problems. Our results show that rich parametrizations effectively mitigate the issue of feasibility in dual methods, shedding light on prior empirical successes of dual learning. We illustrate our findings in fair learning tasks.

Near-Optimal Solutions of Constrained Learning Problems

TL;DR

The paper tackles constrained learning in non-convex settings and shows that dual ascent, when paired with sufficiently rich parametrizations, yields near-feasible and near-optimal solutions without needing randomization across iterates. It derives quantitative bounds that relate the distance to feasibility and optimality to the parametrization richness and the curvature properties of the unparametrized dual, and , highlighting a trade-off governed by problem conditioning. A key contribution is the near-universality framework connecting parametrized and unparametrized problems, with a detailed proof sketch separating dual-variable perturbations from hypothesis-class perturbations. Experimental validation on counterfactual fairness tasks (e.g., COMPAS) demonstrates that last iterates can match randomized baselines in accuracy and fairness constraints, especially as model capacity grows, thereby bridging theory and practice in constrained learning.

Abstract

With the widespread adoption of machine learning systems, the need to curtail their behavior has become increasingly apparent. This is evidenced by recent advancements towards developing models that satisfy robustness, safety, and fairness requirements. These requirements can be imposed (with generalization guarantees) by formulating constrained learning problems that can then be tackled by dual ascent algorithms. Yet, though these algorithms converge in objective value, even in non-convex settings, they cannot guarantee that their outcome is feasible. Doing so requires randomizing over all iterates, which is impractical in virtually any modern applications. Still, final iterates have been observed to perform well in practice. In this work, we address this gap between theory and practice by characterizing the constraint violation of Lagrangian minimizers associated with optimal dual variables, despite lack of convexity. To do this, we leverage the fact that non-convex, finite-dimensional constrained learning problems can be seen as parametrizations of convex, functional problems. Our results show that rich parametrizations effectively mitigate the issue of feasibility in dual methods, shedding light on prior empirical successes of dual learning. We illustrate our findings in fair learning tasks.
Paper Structure (36 sections, 25 theorems, 116 equations, 2 figures, 1 algorithm)

This paper contains 36 sections, 25 theorems, 116 equations, 2 figures, 1 algorithm.

Key Result

Proposition 3.1

Under Assumptions ass:cvx-ass:curvgu, any $f_{\theta}(\lambda^{\star}_p) \in \mathcal{F}^{\star}_{\theta}(\lambda^{\star}_p)$, approximates the constraint value of the solution $\phi^*$ of (Pu) as in:

Figures (2)

  • Figure 1: Feastibility of primal iterates in a constrained learning problem with fairness requirements. Left: Example of a hard constraint which oscillates between feasibiliy and infeasibility, and an easy constraint which remains feasible for all iterations. Right: After training accuracy has settled (around half of training epochs), all but the last constraint are infeasible 30-45 % of the iterations. In fact, at least one constraint is violated on 85% of the iterations shown. We cannot therefore stop the algorithm and expect to obtain a feasible solution.
  • Figure 2: Left: the Unconstrained model performs better in terms of average test accuracy than both the Last and Randomized model. Middle: Both constrained models do better in terms of Counterfactual Fairness. The key point is that the Last iterate is never far from the Randomized one in terms of constraint violation. Right: As the richness of the parametrization increases the maximum constraint violation (i.e: size of the oscillations) decreases.

Theorems & Definitions (29)

  • Proposition 3.1
  • Lemma 3.1
  • Theorem 3.1
  • Proposition 3.2
  • Proposition 3.3
  • Lemma 4.1
  • Proposition 4.1
  • Definition A.1
  • Definition A.2
  • Definition A.3
  • ...and 19 more