Table of Contents
Fetching ...

HardNet: Hard-Constrained Neural Networks with Universal Approximation Guarantees

Youngjae Min, Navid Azizan

TL;DR

HardNet introduces differentiable, closed-form enforcement layers that guarantee hard input-dependent inequality constraints for neural networks, preserving universal approximation capabilities. It provides HardNet-Aff for affine and HardNet-Cvx for convex constraints, with end-to-end training and efficient optimization of multiple constraints. The framework is validated through experiments on learning with piecewise constraints, feasible learning of optimization solvers, and safety-critical control, demonstrating feasibility guarantees alongside competitive performance. This work enables reliable, constraint-compliant learning in safety-critical domains without sacrificing model capacity or trainability.

Abstract

Incorporating prior knowledge or specifications of input-output relationships into machine learning models has attracted significant attention, as it enhances generalization from limited data and yields conforming outputs. However, most existing approaches use soft constraints by penalizing violations through regularization, which offers no guarantee of constraint satisfaction, especially on inputs far from the training distribution--an essential requirement in safety-critical applications. On the other hand, imposing hard constraints on neural networks may hinder their representational power, adversely affecting performance. To address this, we propose HardNet, a practical framework for constructing neural networks that inherently satisfy hard constraints without sacrificing model capacity. Unlike approaches that modify outputs only at inference time, HardNet enables end-to-end training with hard constraint guarantees, leading to improved performance. To the best of our knowledge, HardNet is the first method that enables efficient and differentiable enforcement of more than one input-dependent inequality constraint. It allows unconstrained optimization of the network parameters using standard algorithms by appending a differentiable closed-form enforcement layer to the network's output. Furthermore, we show that HardNet retains neural networks' universal approximation capabilities. We demonstrate its versatility and effectiveness across various applications: learning with piecewise constraints, learning optimization solvers with guaranteed feasibility, and optimizing control policies in safety-critical systems.

HardNet: Hard-Constrained Neural Networks with Universal Approximation Guarantees

TL;DR

HardNet introduces differentiable, closed-form enforcement layers that guarantee hard input-dependent inequality constraints for neural networks, preserving universal approximation capabilities. It provides HardNet-Aff for affine and HardNet-Cvx for convex constraints, with end-to-end training and efficient optimization of multiple constraints. The framework is validated through experiments on learning with piecewise constraints, feasible learning of optimization solvers, and safety-critical control, demonstrating feasibility guarantees alongside competitive performance. This work enables reliable, constraint-compliant learning in safety-critical domains without sacrificing model capacity or trainability.

Abstract

Incorporating prior knowledge or specifications of input-output relationships into machine learning models has attracted significant attention, as it enhances generalization from limited data and yields conforming outputs. However, most existing approaches use soft constraints by penalizing violations through regularization, which offers no guarantee of constraint satisfaction, especially on inputs far from the training distribution--an essential requirement in safety-critical applications. On the other hand, imposing hard constraints on neural networks may hinder their representational power, adversely affecting performance. To address this, we propose HardNet, a practical framework for constructing neural networks that inherently satisfy hard constraints without sacrificing model capacity. Unlike approaches that modify outputs only at inference time, HardNet enables end-to-end training with hard constraint guarantees, leading to improved performance. To the best of our knowledge, HardNet is the first method that enables efficient and differentiable enforcement of more than one input-dependent inequality constraint. It allows unconstrained optimization of the network parameters using standard algorithms by appending a differentiable closed-form enforcement layer to the network's output. Furthermore, we show that HardNet retains neural networks' universal approximation capabilities. We demonstrate its versatility and effectiveness across various applications: learning with piecewise constraints, learning optimization solvers with guaranteed feasibility, and optimizing control policies in safety-critical systems.

Paper Structure

This paper contains 31 sections, 15 theorems, 71 equations, 7 figures, 6 tables.

Key Result

Theorem 1

Let $\rho\in\newline\mathcal{C}(\mathbb{R},\mathbb{R})$ and $\mathcal{K}\in\mathbb{R}$ be a compact set. Then, depth-two neural networks with $\rho$ activation function universally approximate$\mathcal{C}(\mathcal{K},\mathbb{R})$ if and only if $\rho$ is nonpolynomial.

Figures (7)

  • Figure 1: Schematic of HardNet. Its differentiable enforcement layer allows unconstrained end-to-end optimization of the network parameters using standard algorithms while guaranteeing satisfaction with input-dependent constraints by construction. The layer can be applied to any neural networks.
  • Figure 2: Illustration of input-dependent constraints and projections performed by HardNet. A target function $f\!:\!\mathbb{R}\!\rightarrow\!\mathbb{R}^2$ satisfies hard constraints $f(x)\!\in\!\mathcal{C}(x)$ for each $x\!\in\!\mathbb{R}$. The feasible set $\mathcal{C}(x)$ is visualized as the gray area for two sample inputs $x_1$ and $x_2$. While the function $f_\theta$ closely approximates $f$, it violates the constraints. HardNet-Aff projects the violated output onto the feasible set in parallel to the boundaries of the satisfied constraints. In contrast, the minimum $\ell^2$-norm optimization in \ref{['eq:projection_ineq']} projects the output orthogonally to the closest boundary.
  • Figure 3: Learned functions at the initial (left) and final (right) epochs with the piecewise constraints. The models are trained on the samples indicated with circles, with their MSE from the true function shown in parentheses. HardNet-Aff adheres to the constraints from the start of the training and generalizes better than the baselines as it enforces constraints even in the out-of-distribution (OOD) regime. On the other hand, the baselines violate the constraints throughout the training.
  • Figure 4: Simulated trajectories from a random initial state, with costs shown in parentheses. HardNet-Aff avoids the obstacles while obtaining a low cost value. Even though the soft-constrained method and DC3 appear to avoid obstacles and achieve smaller costs than the other collision-free trajectories, they violate the safety constraints (which are more conservative than hitting the obstacles).
  • Figure 5: Visualization of 100 gradient descent steps for training a HardNet-Aff model on a single datapoint (first row) and two datapoints (second row) from the same initialization, using two different learning rates (0.01 and 0.1). With the smaller learning rate, training on a single datapoint results in a zero gradient due to the enforcement layer (top left). However, when training on both datapoints, the vanishing gradient for the first datapoint is mitigated by the nonzero gradient from the second datapoint (bottom left). Also, using the larger learning rate enables the model to avoid the vanishing gradient issue, even when trained on the single datapoint (top right).
  • ...and 2 more figures

Theorems & Definitions (18)

  • Theorem 1: Universal Approximation Theorem for Shallow Networks
  • Theorem 2: Universal Approximation Theorem for Deep Networks
  • Example 1
  • Remark 3
  • Remark 4
  • Proposition 5
  • Proposition 5: Parallel Projection
  • Theorem 6
  • Corollary 7
  • Theorem 8
  • ...and 8 more