Table of Contents
Fetching ...

Theoretical Analysis of the Advantage of Deepening Neural Networks

Yasushi Esaki, Yuta Nakahara, Toshiyasu Matsushima

TL;DR

Two new criteria are proposed to understand the advantage of deepening neural networks and show that to increase layers is more effective than to increase units at each layer on improving the expressivity of deep neural networks.

Abstract

We propose two new criteria to understand the advantage of deepening neural networks. It is important to know the expressivity of functions computable by deep neural networks in order to understand the advantage of deepening neural networks. Unless deep neural networks have enough expressivity, they cannot have good performance even though learning is successful. In this situation, the proposed criteria contribute to understanding the advantage of deepening neural networks since they can evaluate the expressivity independently from the efficiency of learning. The first criterion shows the approximation accuracy of deep neural networks to the target function. This criterion has the background that the goal of deep learning is approximating the target function by deep neural networks. The second criterion shows the property of linear regions of functions computable by deep neural networks. This criterion has the background that deep neural networks whose activation functions are piecewise linear are also piecewise linear. Furthermore, by the two criteria, we show that to increase layers is more effective than to increase units at each layer on improving the expressivity of deep neural networks.

Theoretical Analysis of the Advantage of Deepening Neural Networks

TL;DR

Two new criteria are proposed to understand the advantage of deepening neural networks and show that to increase layers is more effective than to increase units at each layer on improving the expressivity of deep neural networks.

Abstract

We propose two new criteria to understand the advantage of deepening neural networks. It is important to know the expressivity of functions computable by deep neural networks in order to understand the advantage of deepening neural networks. Unless deep neural networks have enough expressivity, they cannot have good performance even though learning is successful. In this situation, the proposed criteria contribute to understanding the advantage of deepening neural networks since they can evaluate the expressivity independently from the efficiency of learning. The first criterion shows the approximation accuracy of deep neural networks to the target function. This criterion has the background that the goal of deep learning is approximating the target function by deep neural networks. The second criterion shows the property of linear regions of functions computable by deep neural networks. This criterion has the background that deep neural networks whose activation functions are piecewise linear are also piecewise linear. Furthermore, by the two criteria, we show that to increase layers is more effective than to increase units at each layer on improving the expressivity of deep neural networks.

Paper Structure

This paper contains 14 sections, 6 theorems, 49 equations, 7 figures, 1 table.

Key Result

Lemma 1

Let $\mathcal{D}$ be a bounded subset of $\mathbb{R}^{n_0}$. Let $\bm{F}_{\bm{\theta}} : \mathcal{D}\to\mathbb{R}^{n_L}$ be a ReLU neural network with $L$ layers, $M$ parameters and $S$ units in total. Then there exists a neural network $\tilde{\bm{F}}_{\bm{\theta}} : \mathcal{D}\to\mathbb{R}^{n_L}$

Figures (7)

  • Figure 1: The relation between the size of the maximal linear regions of deep neural networks and the approximation to the target function by deep neural networks. If the size of the maximal linear region of deep neural networks is large, a gap inevitably occurs between deep neural networks and the target function, as the left figure. On the other hand, if the size of the maximal linear region of deep neural networks is small, there is a possibility that the target function is approximated by deep neural networks, as the right figure.
  • Figure 2: The examples of linear regions of piecewise linear functions.
  • Figure 3: The problem of evaluating the flexibility of deep neural networks by the number of linear regions. We cannot distinguish between $F_1$ and $F_2$ when we use the number of linear regions. Therefore, we propose a new criterion in this paper.
  • Figure 4: The graph of $h_j\ (j=1,\cdots,n_0)$, where $p$ is a even number.
  • Figure 5: The Weierstrass function we used in the computer simulations.
  • ...and 2 more figures

Theorems & Definitions (15)

  • Definition 1
  • Lemma 1: Yarotsky
  • Definition 2: Montufar
  • Example 1
  • Definition 3
  • Example 2
  • Corollary 1
  • Definition 4
  • Example 3
  • Lemma 2
  • ...and 5 more