Table of Contents
Fetching ...

Deep ReLU Networks Have Surprisingly Simple Polytopes

Feng-Lei Fan, Wei Huang, Xiangru Zhong, Lecheng Ruan, Tieyong Zeng, Huan Xiong, Fei Wang

TL;DR

This work proposes to study the shapes of polytopes via the number of faces, and finds that a ReLU network has relatively simple polytopes under both initialization and gradient descent, although these polytopes can be rather diverse and complicated by a specific design.

Abstract

A ReLU network is a piecewise linear function over polytopes. Figuring out the properties of such polytopes is of fundamental importance for the research and development of neural networks. So far, either theoretical or empirical studies on polytopes only stay at the level of counting their number, which is far from a complete characterization. Here, we propose to study the shapes of polytopes via the number of faces of the polytope. Then, by computing and analyzing the histogram of faces across polytopes, we find that a ReLU network has relatively simple polytopes under both initialization and gradient descent, although these polytopes can be rather diverse and complicated by a specific design. This finding can be appreciated as a kind of generalized implicit bias, subjected to the intrinsic geometric constraint in space partition of a ReLU network. Next, we perform a combinatorial analysis to explain why adding depth does not generate a more complicated polytope by bounding the average number of faces of polytopes with the dimensionality. Our results concretely reveal what kind of simple functions a network learns and what will happen when a network goes deep. Also, by characterizing the shape of polytopes, the number of faces can be a novel leverage for other problems, \textit{e.g.}, serving as a generic tool to explain the power of popular shortcut networks such as ResNet and analyzing the impact of different regularization strategies on a network's space partition.

Deep ReLU Networks Have Surprisingly Simple Polytopes

TL;DR

This work proposes to study the shapes of polytopes via the number of faces, and finds that a ReLU network has relatively simple polytopes under both initialization and gradient descent, although these polytopes can be rather diverse and complicated by a specific design.

Abstract

A ReLU network is a piecewise linear function over polytopes. Figuring out the properties of such polytopes is of fundamental importance for the research and development of neural networks. So far, either theoretical or empirical studies on polytopes only stay at the level of counting their number, which is far from a complete characterization. Here, we propose to study the shapes of polytopes via the number of faces of the polytope. Then, by computing and analyzing the histogram of faces across polytopes, we find that a ReLU network has relatively simple polytopes under both initialization and gradient descent, although these polytopes can be rather diverse and complicated by a specific design. This finding can be appreciated as a kind of generalized implicit bias, subjected to the intrinsic geometric constraint in space partition of a ReLU network. Next, we perform a combinatorial analysis to explain why adding depth does not generate a more complicated polytope by bounding the average number of faces of polytopes with the dimensionality. Our results concretely reveal what kind of simple functions a network learns and what will happen when a network goes deep. Also, by characterizing the shape of polytopes, the number of faces can be a novel leverage for other problems, \textit{e.g.}, serving as a generic tool to explain the power of popular shortcut networks such as ResNet and analyzing the impact of different regularization strategies on a network's space partition.
Paper Structure (13 sections, 18 theorems, 42 equations, 10 figures, 1 table)

This paper contains 13 sections, 18 theorems, 42 equations, 10 figures, 1 table.

Key Result

Theorem 1

Let $\mathcal{N}$ be a feedforward ReLU NN with $d$ input features and $L$ hidden layers with $n$ hidden neurons in each layer (with or without skip connections between different layers). Then the number of $d$-simplices in triangulations of all polytopes generated by $\mathcal{N}$ is at most In particular, if $L=1$, we derive the following upper bound for the maximum number of $d$-simplices

Figures (10)

  • Figure 1: The Hit-and-Run algorithm that detects the faces of polytopes and counts them.
  • Figure 2: An explanatory graph of how to construct a network that partitions the space into complicated polytope in the average sense. A cone is generated by the first hidden layer, and neurons in the second hidden layer keep cutting the cone without cutting the regions outside the cone.
  • Figure 3: An explanatory graph of how to construct a network that partitions the space into complicated polytopes in the maximal sense. There exists a parameter configuration that can result in a polytope whose number of faces equals to the number of neurons in a network.
  • Figure 4: Deep ReLU networks have simple linear regions at different initialization methods.
  • Figure 5: The simplicity holds true for deep networks.
  • ...and 5 more figures

Theorems & Definitions (31)

  • Definition 1: Linear regions (polytopes) hanin2019complexity
  • Theorem 1: Upper Bound
  • Theorem 2: Lower Bound
  • proof : Proof of Theorem \ref{['thm:upper']}
  • proof : Proof of Theorem \ref{['thm:lower_bound_main_body']}
  • Lemma 1: Zaslavsky's Theorem zaslavsky1975facingstanley2004introduction
  • Theorem 3
  • proof
  • Theorem 4
  • proof
  • ...and 21 more