Enforcing Hard Linear Constraints in Deep Learning Models with Decision Rules
Gonzalo E. Constante-Flores, Hao Chen, Can Li
TL;DR
This work tackles enforcing hard, input-dependent linear constraints on neural network outputs by introducing a two-subnetwork architecture: a task network for prediction and a safe network that guarantees feasibility. The outputs are merged via a data-dependent convex combination $f_\psi(x) = (1-\alpha_\psi(x)) f^{\mathrm{TN}}_\theta(x) + \alpha_\psi(x) f^{\mathrm{SN}}_\phi(x)$, with equality constraints ensured by projection and inequality constraints enforced for all inputs by the safe network learned through decision rules. The authors prove a universal approximation result for continuous constraint-satisfying functions and provide tractable offline formulations (SDP/LP) for linear decision rules to compute the safe network. Empirical results on DC-OPF and portfolio optimization demonstrate zero constraint violations, competitive objective values, and latency under 3 ms, significantly outperforming iterative methods in speed while maintaining feasibility. The framework offers a scalable, model-agnostic path to deploy constraint-compliant neural predictors in safety-critical domains, with clear directions for extending beyond linear and convex constraint families.
Abstract
Deep learning models are increasingly deployed in safety-critical tasks where predictions must satisfy hard constraints, such as physical laws, fairness requirements, or safety limits. However, standard architectures lack built-in mechanisms to enforce such constraints, and existing approaches based on regularization or projection are often limited to simple constraints, computationally expensive, or lack feasibility guarantees. This paper proposes a model-agnostic framework for enforcing input-dependent linear equality and inequality constraints on neural network outputs. The architecture combines a task network trained for prediction accuracy with a safe network trained using decision rules from the stochastic and robust optimization literature to ensure feasibility across the entire input space. The final prediction is a convex combination of the two subnetworks, guaranteeing constraint satisfaction during both training and inference without iterative procedures or runtime optimization. We prove that the architecture is a universal approximator of constrained functions and derive computationally tractable formulations based on linear decision rules. Empirical results on benchmark regression tasks show that our method consistently satisfies constraints while maintaining competitive accuracy and low inference latency.
