Table of Contents
Fetching ...

Sample-Specific Output Constraints for Neural Networks

Mathis Brosowsky, Olaf Dünkel, Daniel Slieter, Marius Zöllner

TL;DR

The proposed ConstraintNet is a scalable neural network architecture which constrains the output space in each forward pass independently and is end-to-end trainable with almost no overhead in the forward and backward pass and is promising for applying neural networks in safety-critical environments.

Abstract

Neural networks reach state-of-the-art performance in a variety of learning tasks. However, a lack of understanding the decision making process yields to an appearance as black box. We address this and propose ConstraintNet, a neural network with the capability to constrain the output space in each forward pass via an additional input. The prediction of ConstraintNet is proven within the specified domain. This enables ConstraintNet to exclude unintended or even hazardous outputs explicitly whereas the final prediction is still learned from data. We focus on constraints in form of convex polytopes and show the generalization to further classes of constraints. ConstraintNet can be constructed easily by modifying existing neural network architectures. We highlight that ConstraintNet is end-to-end trainable with no overhead in the forward and backward pass. For illustration purposes, we model ConstraintNet by modifying a CNN and construct constraints for facial landmark prediction tasks. Furthermore, we demonstrate the application to a follow object controller for vehicles as a safety-critical application. We submitted an approach and system for the generation of safety-critical outputs of an entity based on ConstraintNet at the German Patent and Trademark Office with the official registration mark DE10 2019 119 739.

Sample-Specific Output Constraints for Neural Networks

TL;DR

The proposed ConstraintNet is a scalable neural network architecture which constrains the output space in each forward pass independently and is end-to-end trainable with almost no overhead in the forward and backward pass and is promising for applying neural networks in safety-critical environments.

Abstract

Neural networks reach state-of-the-art performance in a variety of learning tasks. However, a lack of understanding the decision making process yields to an appearance as black box. We address this and propose ConstraintNet, a neural network with the capability to constrain the output space in each forward pass via an additional input. The prediction of ConstraintNet is proven within the specified domain. This enables ConstraintNet to exclude unintended or even hazardous outputs explicitly whereas the final prediction is still learned from data. We focus on constraints in form of convex polytopes and show the generalization to further classes of constraints. ConstraintNet can be constructed easily by modifying existing neural network architectures. We highlight that ConstraintNet is end-to-end trainable with no overhead in the forward and backward pass. For illustration purposes, we model ConstraintNet by modifying a CNN and construct constraints for facial landmark prediction tasks. Furthermore, we demonstrate the application to a follow object controller for vehicles as a safety-critical application. We submitted an approach and system for the generation of safety-critical outputs of an entity based on ConstraintNet at the German Patent and Trademark Office with the official registration mark DE10 2019 119 739.

Paper Structure

This paper contains 11 sections, 18 equations, 5 figures, 1 algorithm.

Figures (5)

  • Figure 1: Approach to construct ConstraintNet for a class of constraints $\mathfrak{C}\! =\! \{ \mathcal{C}(s)\! \subset\! \mathcal{Y} | s \! \in\! \mathcal{S} \}$. A final layer $\phi$ without learnable parameters maps the output of previous layers $z\!=\!h_{\theta}(x, g(s))$ on the constrained output space $\mathcal{C}(s)$ depending on the constraint parameter $s$. The previous layers $h_{\theta}$ get a representation $g(s)$ of $s$ as an additional input to the data point $x$. This enables ConstraintNet to deal with different constraints for the same $x$.
  • Figure 2: Construction of ConstraintNet by extending a CNN. For illustration purposes, we show a nose landmark prediction on an image $x$ with an output constraint in form of a triangle, i.e. a convex polytope with three vertices $\{v^{(i)}(s)\}_{i=1}^3$. The constraint parameter $s$ specifies the chosen constraint and consists in this case of concatenated vertex coordinates. A tensor representation $g(s)$ of $s$ is concatenated to the output of an intermediate convolutional layer and extends the input of the next layer. Instead of creating the final output for the nose landmark with a 2-dimensional dense layer, a 3-dimensional intermediate representation $z$ is generated. The constraint guard layer $\phi$ applies a softmax function $\sigma$ on $z$ and weights the three vertices of the triangle with the softmax outputs. This guarantees a prediction $\hat{y}$ within the specified triangle.
  • Figure 3: Top left: Landmark predictions for nose, left and right eye are confined to a bounding box around the face. Top right: In addition to the bounding box constraint, relations between landmarks are introduced, namely the eyes are above the nose and the left eye is in fact to the left of the right eye. Bottom: The nose landmark is constrained to a domain in form of a triangle (left) or a sector of a circle (right), respectively.
  • Figure 4: Confining landmark predictions for the nose $(\hat{x}_n, \hat{y}_n)$, the left eye $(\hat{x}_{le}, \hat{y}_{le})$ and the right eye $(\hat{x}_{re}, \hat{y}_{re})$ to a bounding box with boundaries $l^{(x)}, u^{(x)}, l^{(y)}, u^{(y)}$, and enforcing that the eyes are above the nose ($\hat{y}_{le},\hat{y}_{re} \! \le \! \hat{y}_n$) and that the left eye is to the left of the right eye ($\hat{x}_{le} \!\le\! \hat{x}_{re}$) is equivalent to constraining the output parts $\hat{y}^{(1)}\!=\!\hat{x}_{n}$ to the line segment a), $\hat{y}^{(2)}\!=\!(\hat{x}_{le},\hat{x}_{re})$ to the triangle in b) and $\hat{y}^{(3)}\!=\!(\hat{y}_n, \hat{y}_{le}, \hat{y}_{re})$ to the pyramid in c).
  • Figure 5: The follow object controller (FOC) in a vehicle (ego-vehicle) is only active when another vehicle (target-vehicle) is ahead. Sensors measure the velocity of the ego-vehicle $v_{ego}$ and the relative position (distance) $x_{rel}$, the relative velocity $v_{rel}$ and the relative acceleration $a_{rel}$ of the target vehicle w.r.t. the coordinate system of the ego-vehicle. The FOC gets at least these sensor measurements as input and attempts to keep the distance to the target vehicle $x_{rel}$ close to a velocity dependent distance $x_{rel,set}( v_{ego})$ under consideration of comfort and safety aspects. The output of the FOC is a demanded acceleration $a_{ego,dem}$.