Table of Contents
Fetching ...

Learning with Statistical Equality Constraints

Aneesh Barthakur, Luiz F. O. Chamon

TL;DR

This work advances equality-constrained learning by deriving a generalization theory for problems with exact equality constraints and proposing a practical primal-dual algorithm based on solving a sequence of unconstrained subproblems. It shows that, under regularity and decomposability conditions, the constrained problem can be closely approximated by its empirical dual, with a generalization bound that separates duality-gap and dual-estimation error and depends on constraint sensitivity and sample size. The framework is demonstrated on fairness (demographic parity), boundary-value problems, and interpolating classifiers, revealing new formulation possibilities (e.g., prescribed-rate constraints) and advantages over penalty-based approaches. The results suggest that equality constraints can be integrated into ML training with principled guarantees and practical algorithms, enabling precise control over fairness, physics-informed constraints, and class-wise interpolation behavior.

Abstract

As machine learning applications grow increasingly ubiquitous and complex, they face an increasing set of requirements beyond accuracy. The prevalent approach to handle this challenge is to aggregate a weighted combination of requirement violation penalties into the training objective. To be effective, this approach requires careful tuning of these hyperparameters (weights), involving trial-and-error and cross-validation, which becomes ineffective even for a moderate number of requirements. These issues are exacerbated when the requirements involve parities or equalities, as is the case in fairness and boundary value problems. An alternative technique uses constrained optimization to formulate these learning problems. Yet, existing approximation and generalization guarantees do not apply to problems involving equality constraints. In this work, we derive a generalization theory for equality-constrained statistical learning problems, showing that their solutions can be approximated using samples and rich parametrizations. Using these results, we propose a practical algorithm based on solving a sequence of unconstrained, empirical learning problems. We showcase its effectiveness and the new formulations enabled by equality constraints in fair learning, interpolating classifiers, and boundary value problems.

Learning with Statistical Equality Constraints

TL;DR

This work advances equality-constrained learning by deriving a generalization theory for problems with exact equality constraints and proposing a practical primal-dual algorithm based on solving a sequence of unconstrained subproblems. It shows that, under regularity and decomposability conditions, the constrained problem can be closely approximated by its empirical dual, with a generalization bound that separates duality-gap and dual-estimation error and depends on constraint sensitivity and sample size. The framework is demonstrated on fairness (demographic parity), boundary-value problems, and interpolating classifiers, revealing new formulation possibilities (e.g., prescribed-rate constraints) and advantages over penalty-based approaches. The results suggest that equality constraints can be integrated into ML training with principled guarantees and practical algorithms, enabling precise control over fairness, physics-informed constraints, and class-wise interpolation behavior.

Abstract

As machine learning applications grow increasingly ubiquitous and complex, they face an increasing set of requirements beyond accuracy. The prevalent approach to handle this challenge is to aggregate a weighted combination of requirement violation penalties into the training objective. To be effective, this approach requires careful tuning of these hyperparameters (weights), involving trial-and-error and cross-validation, which becomes ineffective even for a moderate number of requirements. These issues are exacerbated when the requirements involve parities or equalities, as is the case in fairness and boundary value problems. An alternative technique uses constrained optimization to formulate these learning problems. Yet, existing approximation and generalization guarantees do not apply to problems involving equality constraints. In this work, we derive a generalization theory for equality-constrained statistical learning problems, showing that their solutions can be approximated using samples and rich parametrizations. Using these results, we propose a practical algorithm based on solving a sequence of unconstrained, empirical learning problems. We showcase its effectiveness and the new formulations enabled by equality constraints in fair learning, interpolating classifiers, and boundary value problems.

Paper Structure

This paper contains 66 sections, 29 theorems, 189 equations, 13 figures, 2 tables, 1 algorithm.

Key Result

Theorem 3.1

Let $N_{\text{min}} = \min \left\lbrace M_0, M_1 \ldots, M_I, N_1, \ldots, N_J \right\rbrace$ and assume that $R(\nu) < \infty$ and $P^\star_{\phi} > -\infty$. Under Assumptions ass:relint-ass:regularity there exist $\gamma^\star_{\phi} \in \text{Opt}(\text{D}_{\phi}), \gamma^\star \in \text{Opt}(\t

Figures (13)

  • Figure 1: Exact vs approximate fairness. (a) Comparison betweeen \ref{['P:fair']} and the inequality relaxation described in Remark \ref{['R:eq_vs_ineq']} with parameter $\epsilon > 0$ (10 random splits). Mean accuracy (across splits) is reported for each method/tolerance. (b) Final (effective) dual variables for Algorithm \ref{['alg:primaldual']}. Indeed, since the inequality relaxation uses two constraints for each group (upper and lower bound), we show only the difference between upper and lower dual variables.
  • Figure 2: Prescribed rates. Solutions of \ref{['P:prescriptive']} for different $r_j$ (10 random splits). (a) Average rate of positive outcomes across population, annotated with the mean accuracy (across splits); (b) Rate disparity across different groups.
  • Figure 3: Relative $L^2$ error for convection BVP ($\beta=50$). The lines show the mean curve across 5 runs, and the shaded region indicates the max and min curves.
  • Figure 4: Comparison of classwise test errors and dual values for CIFAR-100 trained with \ref{['P:class']} (along with the best fit line). The correlation is $0.89$ indicating a strong linear relationship.
  • Figure 5: Prescribed rates. We see that at $r_j=0.5$, the mean accuracy of the model is slightly better than the Exact DP solution. At the same time, for $r_j=0.5$, the model achieves a group disparity that is comparable to the Exact DP solution (see Figure \ref{['fig:prescriptive']}(b)) but at a different rate. Thus \ref{['P:prescriptive']} enables new tradeoffs between group disparity and accuracy, that cannot be found by using \ref{['P:fair']}.
  • ...and 8 more figures

Theorems & Definitions (65)

  • Remark 2.1
  • Theorem 3.1
  • Remark 3.1: Comparison with results for inequality constraints
  • Remark 3.2: Functional strong duality
  • Theorem 4.1
  • Lemma B.1
  • proof
  • Definition B.1
  • Definition B.2
  • Lemma B.2
  • ...and 55 more