Table of Contents
Fetching ...

On the number of response regions of deep feed forward networks with piece-wise linear activations

Razvan Pascanu, Guido Montufar, Yoshua Bengio

TL;DR

This work studies the expressiveness of deep networks with piecewise linear activations by counting regions of linearity in the input space. It develops a geometric framework based on hyperplane arrangements to compare deep and shallow architectures, providing exact bounds for a single hidden layer and a main theorem that lower-bounds the region count for k-layer networks. A key contribution is a constructive argument showing deep models can yield exponentially more response regions than shallow ones with the same budget of units, and a special class of deep models demonstrates rapid region growth even at modest widths. The findings offer a principled explanation for the empirical success of deep, piecewise linear networks and point to extensions to other piecewise linear architectures like maxout and convolutional nets.

Abstract

This paper explores the complexity of deep feedforward networks with linear pre-synaptic couplings and rectified linear activations. This is a contribution to the growing body of work contrasting the representational power of deep and shallow network architectures. In particular, we offer a framework for comparing deep and shallow models that belong to the family of piecewise linear functions based on computational geometry. We look at a deep rectifier multi-layer perceptron (MLP) with linear outputs units and compare it with a single layer version of the model. In the asymptotic regime, when the number of inputs stays constant, if the shallow model has $kn$ hidden units and $n_0$ inputs, then the number of linear regions is $O(k^{n_0}n^{n_0})$. For a $k$ layer model with $n$ hidden units on each layer it is $Ω(\left\lfloor {n}/{n_0}\right\rfloor^{k-1}n^{n_0})$. The number $\left\lfloor{n}/{n_0}\right\rfloor^{k-1}$ grows faster than $k^{n_0}$ when $n$ tends to infinity or when $k$ tends to infinity and $n \geq 2n_0$. Additionally, even when $k$ is small, if we restrict $n$ to be $2n_0$, we can show that a deep model has considerably more linear regions that a shallow one. We consider this as a first step towards understanding the complexity of these models and specifically towards providing suitable mathematical tools for future analysis.

On the number of response regions of deep feed forward networks with piece-wise linear activations

TL;DR

This work studies the expressiveness of deep networks with piecewise linear activations by counting regions of linearity in the input space. It develops a geometric framework based on hyperplane arrangements to compare deep and shallow architectures, providing exact bounds for a single hidden layer and a main theorem that lower-bounds the region count for k-layer networks. A key contribution is a constructive argument showing deep models can yield exponentially more response regions than shallow ones with the same budget of units, and a special class of deep models demonstrates rapid region growth even at modest widths. The findings offer a principled explanation for the empirical success of deep, piecewise linear networks and point to extensions to other piecewise linear architectures like maxout and convolutional nets.

Abstract

This paper explores the complexity of deep feedforward networks with linear pre-synaptic couplings and rectified linear activations. This is a contribution to the growing body of work contrasting the representational power of deep and shallow network architectures. In particular, we offer a framework for comparing deep and shallow models that belong to the family of piecewise linear functions based on computational geometry. We look at a deep rectifier multi-layer perceptron (MLP) with linear outputs units and compare it with a single layer version of the model. In the asymptotic regime, when the number of inputs stays constant, if the shallow model has hidden units and inputs, then the number of linear regions is . For a layer model with hidden units on each layer it is . The number grows faster than when tends to infinity or when tends to infinity and . Additionally, even when is small, if we restrict to be , we can show that a deep model has considerably more linear regions that a shallow one. We consider this as a first step towards understanding the complexity of these models and specifically towards providing suitable mathematical tools for future analysis.

Paper Structure

This paper contains 7 sections, 13 theorems, 25 equations, 9 figures.

Key Result

Lemma 1

Consider a width $k$ layer of rectifier units. Let $R^i=\{R^i_1,\ldots, R^i_{N_i} \}$ be the regions of linearity of the function $f_i\colon \mathbb{R}^{n_0}\to\mathbb{R}$ computed by the $i$-th unit, for all $i\in [k]$. Then the regions of linearity of the function $f = (f_i)_{i\in[k]}\colon \mathb

Figures (9)

  • Figure 1: Illustration of a rectifier feedforward network with two hidden layers.
  • Figure 2: Induction step of the hyperplane sweep method for counting the regions of line arrangements in the plane.
  • Figure 3: An arrangement $\mathcal{A}$ and a scaled-shifted version $\mathcal{A}'$ whose regions intersect the ball $\mathcal{S}$.
  • Figure 4: Illustration of the hyperplane arrangement discussed in Proposition \ref{['proposition:special_arrangement']}, in the $2$-dimensional case. On the left we have arrangements of two and three lines, and on the right an arrangement of four lines.
  • Figure 5: Illustration of Example \ref{['example:klayermodel']}. The units represented by squares build an intermediary layer of linear units between the first and the second hidden layers. The computation of such an intermediary linear layer can be absorbed in the second hidden layer of rectifier units (Lemma \ref{['lemma:decomposition']}). The connectivity map depicts the maps $g_1$ by dashed arrows and $g_2$ by dashed-dotted arrows.
  • ...and 4 more figures

Theorems & Definitions (28)

  • Definition 1
  • Lemma 1
  • proof
  • Lemma 2
  • proof
  • Lemma 3
  • proof
  • Proposition 1
  • Proposition 2
  • proof
  • ...and 18 more