Table of Contents
Fetching ...

Stably unactivated neurons in ReLU neural networks

Natalie Brownlowe, Christopher R. Cornwell, Ethan Montes, Gabriel Quijano, Grace Stulman, Na Zhang

TL;DR

This work analyzes how architecture and random symmetric initialization affect the expressiveness of ReLU networks by studying stably unactivated neurons, i.e., hidden neurons whose activations remain zero under small parameter perturbations. Using the geometry of hyperplane arrangements induced by the first layer, the authors derive exact probabilities for a second-layer neuron to be stably unactivated: $\frac{1}{2^{n_1+1}}$ when $n_1\le n_0$, and $\frac{2^{n_0}+1}{4^{n_0+1}}$ when $n_1=n_0+1$. They further propose a conjecture for the case $n_1>n_0+1$, supported by computational evidence, and outline a broader conjectural picture for large $n_1$ with a lower bound on the probability in terms of $n_0$. These results connect stochastic initialization with the deterministic geometry of hyperplane partitions to quantify how dead or stably inactive neurons constrain the functional dimension and expressiveness of neural networks.

Abstract

The choice of architecture of a neural network influences which functions will be realizable by that neural network and, as a result, studying the expressiveness of a chosen architecture has received much attention. In ReLU neural networks, the presence of stably unactivated neurons can reduce the network's expressiveness. In this work, we investigate the probability of a neuron in the second hidden layer of such neural networks being stably unactivated when the weights and biases are initialized from symmetric probability distributions. For networks with input dimension $n_0$, we prove that if the first hidden layer has $n_0+1$ neurons then this probability is exactly $\frac{2^{n_0}+1}{4^{n_0+1}}$, and if the first hidden layer has $n_1$ neurons, $n_1 \le n_0$, then the probability is $\frac{1}{2^{n_1+1}}$. Finally, for the case when the first hidden layer has more neurons than $n_0+1$, a conjecture is proposed along with the rationale. Computational evidence is presented to support the conjecture.

Stably unactivated neurons in ReLU neural networks

TL;DR

This work analyzes how architecture and random symmetric initialization affect the expressiveness of ReLU networks by studying stably unactivated neurons, i.e., hidden neurons whose activations remain zero under small parameter perturbations. Using the geometry of hyperplane arrangements induced by the first layer, the authors derive exact probabilities for a second-layer neuron to be stably unactivated: when , and when . They further propose a conjecture for the case , supported by computational evidence, and outline a broader conjectural picture for large with a lower bound on the probability in terms of . These results connect stochastic initialization with the deterministic geometry of hyperplane partitions to quantify how dead or stably inactive neurons constrain the functional dimension and expressiveness of neural networks.

Abstract

The choice of architecture of a neural network influences which functions will be realizable by that neural network and, as a result, studying the expressiveness of a chosen architecture has received much attention. In ReLU neural networks, the presence of stably unactivated neurons can reduce the network's expressiveness. In this work, we investigate the probability of a neuron in the second hidden layer of such neural networks being stably unactivated when the weights and biases are initialized from symmetric probability distributions. For networks with input dimension , we prove that if the first hidden layer has neurons then this probability is exactly , and if the first hidden layer has neurons, , then the probability is . Finally, for the case when the first hidden layer has more neurons than , a conjecture is proposed along with the rationale. Computational evidence is presented to support the conjecture.

Paper Structure

This paper contains 16 sections, 17 theorems, 42 equations, 6 figures.

Key Result

Theorem 1.1

Let $(n_0,n_1,\ldots,n_L)$ be an architecture of a ReLU neural network. Suppose that the weights and biases in each single layer are selected i.i.d. from a probability distribution on $\mathbb{R}$ that is symmetric about $0$. If $n_1 \leq n_0$, then for any one of the $n_2$ neurons in the second lay

Figures (6)

  • Figure 1: An example generic hyperplane arrangement $\textbf{A}$ in $\mathbb{R}^2$ equipped with a coorientation. Codewords of regions are indicated, with $\operatorname{code}(\textbf{A})$ consisting of eleven elements in $\{+,-\}^4$.
  • Figure 2: Example hyperplane arrangement $\textbf{A}_1$ (left) and corresponding image of first layer map $F_1:\mathbb{R}^2\to\mathbb{R}^3$ (right), where $\theta\in C_0$.
  • Figure 3: Hyperplane arrangement $\textbf{A}_1$ (left) and image of first layer map $F_1:\mathbb{R}^2\to\mathbb{R}^3$ (right) for example $\theta\in C_3$; $n_0=2$, $n_1=3$.
  • Figure 4: Hyperplane arrangement $\textbf{A}_1$ (left) and image of first layer map $F_1:\mathbb{R}^2\to\mathbb{R}^3$ (right) for example $\theta\in C_6$; $n_0=2$, $n_1=3$.
  • Figure 5: Empirical probabilities of a neuron in the second layer being stably unactivated for architectures with $n_0 = 2, 3,$ and $4$. Along the horizontal axis we show the amount by which $n_1$ is larger than $n_0$; the dashed horizontal line is at height $1/4^{n_0+1}$.
  • ...and 1 more figures

Theorems & Definitions (50)

  • Theorem 1.1
  • Theorem 1.2
  • Remark 1
  • Conjecture 1.3
  • Definition 2.1
  • Definition 2.2
  • Remark 2
  • Definition 2.3
  • Definition 2.4
  • Remark 3
  • ...and 40 more