Stably unactivated neurons in ReLU neural networks
Natalie Brownlowe, Christopher R. Cornwell, Ethan Montes, Gabriel Quijano, Grace Stulman, Na Zhang
TL;DR
This work analyzes how architecture and random symmetric initialization affect the expressiveness of ReLU networks by studying stably unactivated neurons, i.e., hidden neurons whose activations remain zero under small parameter perturbations. Using the geometry of hyperplane arrangements induced by the first layer, the authors derive exact probabilities for a second-layer neuron to be stably unactivated: $\frac{1}{2^{n_1+1}}$ when $n_1\le n_0$, and $\frac{2^{n_0}+1}{4^{n_0+1}}$ when $n_1=n_0+1$. They further propose a conjecture for the case $n_1>n_0+1$, supported by computational evidence, and outline a broader conjectural picture for large $n_1$ with a lower bound on the probability in terms of $n_0$. These results connect stochastic initialization with the deterministic geometry of hyperplane partitions to quantify how dead or stably inactive neurons constrain the functional dimension and expressiveness of neural networks.
Abstract
The choice of architecture of a neural network influences which functions will be realizable by that neural network and, as a result, studying the expressiveness of a chosen architecture has received much attention. In ReLU neural networks, the presence of stably unactivated neurons can reduce the network's expressiveness. In this work, we investigate the probability of a neuron in the second hidden layer of such neural networks being stably unactivated when the weights and biases are initialized from symmetric probability distributions. For networks with input dimension $n_0$, we prove that if the first hidden layer has $n_0+1$ neurons then this probability is exactly $\frac{2^{n_0}+1}{4^{n_0+1}}$, and if the first hidden layer has $n_1$ neurons, $n_1 \le n_0$, then the probability is $\frac{1}{2^{n_1+1}}$. Finally, for the case when the first hidden layer has more neurons than $n_0+1$, a conjecture is proposed along with the rationale. Computational evidence is presented to support the conjecture.
