Table of Contents
Fetching ...

On the existence of minimizers in shallow residual ReLU neural network optimization landscapes

Steffen Dereich, Arnulf Jentzen, Sebastian Kassing

TL;DR

The proof of the existence property makes use of a closure of the search space containing all functions generated by ANNs and additional discontinuous generalized responses that are suboptimal so that the minimum is attained in the original function class.

Abstract

In this article, we show existence of minimizers in the loss landscape for residual artificial neural networks (ANNs) with multi-dimensional input layer and one hidden layer with ReLU activation. Our work contrasts earlier results in [D. Gallon, A. Jentzen, and F. Lindner, preprint, arXiv:2211.15641, 2022] and [P. Petersen, M. Raslan, and F. Voigtlaender, Found. Comput. Math., 21 (2021), pp. 375-444] which showed that in many situations minimizers do not exist for common smooth activation functions even in the case where the target functions are polynomials. The proof of the existence property makes use of a closure of the search space containing all functions generated by ANNs and additional discontinuous generalized responses. As we will show, the additional generalized responses in this larger space are suboptimal so that the minimum is attained in the original function class.

On the existence of minimizers in shallow residual ReLU neural network optimization landscapes

TL;DR

The proof of the existence property makes use of a closure of the search space containing all functions generated by ANNs and additional discontinuous generalized responses that are suboptimal so that the minimum is attained in the original function class.

Abstract

In this article, we show existence of minimizers in the loss landscape for residual artificial neural networks (ANNs) with multi-dimensional input layer and one hidden layer with ReLU activation. Our work contrasts earlier results in [D. Gallon, A. Jentzen, and F. Lindner, preprint, arXiv:2211.15641, 2022] and [P. Petersen, M. Raslan, and F. Voigtlaender, Found. Comput. Math., 21 (2021), pp. 375-444] which showed that in many situations minimizers do not exist for common smooth activation functions even in the case where the target functions are polynomials. The proof of the existence property makes use of a closure of the search space containing all functions generated by ANNs and additional discontinuous generalized responses. As we will show, the additional generalized responses in this larger space are suboptimal so that the minimum is attained in the original function class.
Paper Structure (4 sections, 7 theorems, 121 equations, 2 figures)

This paper contains 4 sections, 7 theorems, 121 equations, 2 figures.

Key Result

Theorem 1.1

Let $p \in (1,\infty)$, $d_{\mathrm{in}} \in {\mathbb N}$, $d \in {\mathbb N}_{0}$, let $f \colon {\mathbb R}^{ d_{\mathrm{in}} } \to {\mathbb R}$ and $h \colon {\mathbb R}^{ d_{\mathrm{in}} } \to [0,\infty)$ be continuous, assume that $h^{ - 1 }( (0,\infty) )$ is a bounded and convex set, and let $ where $\mathcal{W}_d$ is as in eq:structurized and where $\mathfrak{N}^{ \mathbb W }$ is as in eq:r

Figures (2)

  • Figure 1: Visualization of the minimization task in \ref{['exa:counter']}. There exists a generalized response of dimension $2$ but no neural network $\mathbb W \in \mathcal{W}_d$ ($d \in {\mathbb N}$) attaining zero error.
  • Figure 2: Visualization of the minimization task in Example \ref{['exa:counter2']}. There exists a network $\mathbb W \in \mathcal{W}_3$ with $\mathfrak N^\mathbb W=f$ ( blue). However, the realization function attaining minimal error in the class $\mathcal{W}_2$, $\mathcal{W}_1$, and $\mathcal{W}_0$ is $\mathfrak{N}^{ \mathbb W } = 0$ ( red).

Theorems & Definitions (20)

  • Theorem 1.1: Existence of minimizer -- residual ANNs
  • Theorem 1.2: Existence of minimizers -- general loss functions and fully-connected residual ANNs
  • Example 1.3: Regression problem
  • Definition 3.1
  • Remark 3.2
  • Remark 3.3: Asymptotic ANN representations for generalized responses
  • Definition 3.4
  • Proposition 3.5
  • proof
  • Lemma 4.1
  • ...and 10 more