Table of Contents
Fetching ...

Least Squares Training of Quadratic Convolutional Neural Networks with Applications to System Theory

Zachary Yetman Van Egmond, Luis Rodrigues

TL;DR

The least squares network is shown to have a significantly reduced training time with minimal compromises on prediction accuracy alongside the advantages of having an analytic input-output equation.

Abstract

This paper provides a least squares formulation for the training of a 2-layer convolutional neural network using quadratic activation functions, a 2-norm loss function, and no regularization term. Using this method, an analytic expression for the globally optimal weights is obtained alongside a quadratic input-output equation for the network. These properties make the network a viable tool in system theory by enabling further analysis, such as the sensitivity of the output to perturbations in the input, which is crucial for safety-critical systems such as aircraft or autonomous vehicles. The least squares method is compared to previously proposed strategies for training quadratic networks and to a back-propagation-trained ReLU network. The proposed method is applied to a system identification problem and a GPS position estimation problem. The least squares network is shown to have a significantly reduced training time with minimal compromises on prediction accuracy alongside the advantages of having an analytic input-output equation. Although these results only apply to 2-layer networks, this paper motivates the exploration of deeper quadratic networks in the context of system theory.

Least Squares Training of Quadratic Convolutional Neural Networks with Applications to System Theory

TL;DR

The least squares network is shown to have a significantly reduced training time with minimal compromises on prediction accuracy alongside the advantages of having an analytic input-output equation.

Abstract

This paper provides a least squares formulation for the training of a 2-layer convolutional neural network using quadratic activation functions, a 2-norm loss function, and no regularization term. Using this method, an analytic expression for the globally optimal weights is obtained alongside a quadratic input-output equation for the network. These properties make the network a viable tool in system theory by enabling further analysis, such as the sensitivity of the output to perturbations in the input, which is crucial for safety-critical systems such as aircraft or autonomous vehicles. The least squares method is compared to previously proposed strategies for training quadratic networks and to a back-propagation-trained ReLU network. The proposed method is applied to a system identification problem and a GPS position estimation problem. The least squares network is shown to have a significantly reduced training time with minimal compromises on prediction accuracy alongside the advantages of having an analytic input-output equation. Although these results only apply to 2-layer networks, this paper motivates the exploration of deeper quadratic networks in the context of system theory.

Paper Structure

This paper contains 10 sections, 6 theorems, 38 equations, 6 figures, 1 table.

Key Result

Lemma 1

bartan2021neural For fixed $a\neq0$, $b$, $c$, regularization term $\beta\geq0$, and convex loss function $l(\cdot)$, the non-convex problem (eqn_NNOptim) has the same global optimal solution as the convex problem given by when the number of neurons $m\geq m^*$ where with $Z_{+}^*,Z_{-}^*\in\mathbb{R}^{(n+1)\times(n+1)}$ being the solutions to the optimization problem (eqn_QNNConvex).

Figures (6)

  • Figure 1: Visualization of $\bar{Z}^1$ and $\bar{Z}^2$, for $f=5$, stride 1, and $n=12$. White squares represent zero entries, colored squares show the number of overlaps of different $Z^{k,1}$ and $Z^{k,2}$ to form the elements of $\bar{Z}^1$ and $\bar{Z}^2$.
  • Figure 2: Visualization of $\bar{Z}^1$ and $\bar{Z}^2$ for a 2D convolution with $3\times 3$ filter on a $5\times5$ piece of 2D data. White squares represents zero entries, colored squares show the number of overlaps of different $Z^{k,1}$ and $Z^{k,2}$ to form the elements of $\bar{Z}^1$ and $\bar{Z}^2$.
  • Figure 3: Entries of $xx^T$ with diagonals marked
  • Figure 4: System identification comparison between different networks
  • Figure 5: Synthetic drone trajectory with starting position indicated in red
  • ...and 1 more figures

Theorems & Definitions (12)

  • Lemma 1
  • Lemma 2
  • Lemma 3
  • Lemma 4
  • proof
  • Theorem 1
  • proof
  • Remark 1: Number of weights
  • Remark 2: Higher dimension convolutions
  • Corollary 1.1
  • ...and 2 more