N-ReLU: Zero-Mean Stochastic Extension of ReLU
Md Motaleb Hossen Manik, Md Zabirul Islam, Ge Wang
TL;DR
This work addresses the dying ReLU problem by introducing N-ReLU, a zero-mean Gaussian-noise extension of the ReLU activation that injects noise only in the negative region while preserving the same expected output. The method acts as a lightweight, parameter-free, annealing-like regularizer that maintains gradient flow and stabilizes training. Experiments on MNIST with both MLP and CNN architectures show that moderate noise levels ($\sigma \approx 0.05$–$0.10$) yield accuracy comparable to or slightly better than ReLU and other smooth activations, with zero dead neurons observed. Overall, N-ReLU provides a simple, theoretically grounded mechanism to improve optimization robustness without modifying network structure or introducing learnable parameters.
Abstract
Activation functions are fundamental for enabling nonlinear representations in deep neural networks. However, the standard rectified linear unit (ReLU) often suffers from inactive or "dead" neurons caused by its hard zero cutoff. To address this issue, we introduce N-ReLU (Noise-ReLU), a zero-mean stochastic extension of ReLU that replaces negative activations with Gaussian noise while preserving the same expected output. This expectation-aligned formulation maintains gradient flow in inactive regions and acts as an annealing-style regularizer during training. Experiments on the MNIST dataset using both multilayer perceptron (MLP) and convolutional neural network (CNN) architectures show that N-ReLU achieves accuracy comparable to or slightly exceeding that of ReLU, LeakyReLU, PReLU, GELU, and RReLU at moderate noise levels (sigma = 0.05-0.10), with stable convergence and no dead neurons observed. These results demonstrate that lightweight Gaussian noise injection offers a simple yet effective mechanism to enhance optimization robustness without modifying network structures or introducing additional parameters.
