Learning with Statistical Equality Constraints
Aneesh Barthakur, Luiz F. O. Chamon
TL;DR
This work advances equality-constrained learning by deriving a generalization theory for problems with exact equality constraints and proposing a practical primal-dual algorithm based on solving a sequence of unconstrained subproblems. It shows that, under regularity and decomposability conditions, the constrained problem can be closely approximated by its empirical dual, with a generalization bound that separates duality-gap and dual-estimation error and depends on constraint sensitivity and sample size. The framework is demonstrated on fairness (demographic parity), boundary-value problems, and interpolating classifiers, revealing new formulation possibilities (e.g., prescribed-rate constraints) and advantages over penalty-based approaches. The results suggest that equality constraints can be integrated into ML training with principled guarantees and practical algorithms, enabling precise control over fairness, physics-informed constraints, and class-wise interpolation behavior.
Abstract
As machine learning applications grow increasingly ubiquitous and complex, they face an increasing set of requirements beyond accuracy. The prevalent approach to handle this challenge is to aggregate a weighted combination of requirement violation penalties into the training objective. To be effective, this approach requires careful tuning of these hyperparameters (weights), involving trial-and-error and cross-validation, which becomes ineffective even for a moderate number of requirements. These issues are exacerbated when the requirements involve parities or equalities, as is the case in fairness and boundary value problems. An alternative technique uses constrained optimization to formulate these learning problems. Yet, existing approximation and generalization guarantees do not apply to problems involving equality constraints. In this work, we derive a generalization theory for equality-constrained statistical learning problems, showing that their solutions can be approximated using samples and rich parametrizations. Using these results, we propose a practical algorithm based on solving a sequence of unconstrained, empirical learning problems. We showcase its effectiveness and the new formulations enabled by equality constraints in fair learning, interpolating classifiers, and boundary value problems.
