Table of Contents
Fetching ...

N-ReLU: Zero-Mean Stochastic Extension of ReLU

Md Motaleb Hossen Manik, Md Zabirul Islam, Ge Wang

TL;DR

This work addresses the dying ReLU problem by introducing N-ReLU, a zero-mean Gaussian-noise extension of the ReLU activation that injects noise only in the negative region while preserving the same expected output. The method acts as a lightweight, parameter-free, annealing-like regularizer that maintains gradient flow and stabilizes training. Experiments on MNIST with both MLP and CNN architectures show that moderate noise levels ($\sigma \approx 0.05$–$0.10$) yield accuracy comparable to or slightly better than ReLU and other smooth activations, with zero dead neurons observed. Overall, N-ReLU provides a simple, theoretically grounded mechanism to improve optimization robustness without modifying network structure or introducing learnable parameters.

Abstract

Activation functions are fundamental for enabling nonlinear representations in deep neural networks. However, the standard rectified linear unit (ReLU) often suffers from inactive or "dead" neurons caused by its hard zero cutoff. To address this issue, we introduce N-ReLU (Noise-ReLU), a zero-mean stochastic extension of ReLU that replaces negative activations with Gaussian noise while preserving the same expected output. This expectation-aligned formulation maintains gradient flow in inactive regions and acts as an annealing-style regularizer during training. Experiments on the MNIST dataset using both multilayer perceptron (MLP) and convolutional neural network (CNN) architectures show that N-ReLU achieves accuracy comparable to or slightly exceeding that of ReLU, LeakyReLU, PReLU, GELU, and RReLU at moderate noise levels (sigma = 0.05-0.10), with stable convergence and no dead neurons observed. These results demonstrate that lightweight Gaussian noise injection offers a simple yet effective mechanism to enhance optimization robustness without modifying network structures or introducing additional parameters.

N-ReLU: Zero-Mean Stochastic Extension of ReLU

TL;DR

This work addresses the dying ReLU problem by introducing N-ReLU, a zero-mean Gaussian-noise extension of the ReLU activation that injects noise only in the negative region while preserving the same expected output. The method acts as a lightweight, parameter-free, annealing-like regularizer that maintains gradient flow and stabilizes training. Experiments on MNIST with both MLP and CNN architectures show that moderate noise levels () yield accuracy comparable to or slightly better than ReLU and other smooth activations, with zero dead neurons observed. Overall, N-ReLU provides a simple, theoretically grounded mechanism to improve optimization robustness without modifying network structure or introducing learnable parameters.

Abstract

Activation functions are fundamental for enabling nonlinear representations in deep neural networks. However, the standard rectified linear unit (ReLU) often suffers from inactive or "dead" neurons caused by its hard zero cutoff. To address this issue, we introduce N-ReLU (Noise-ReLU), a zero-mean stochastic extension of ReLU that replaces negative activations with Gaussian noise while preserving the same expected output. This expectation-aligned formulation maintains gradient flow in inactive regions and acts as an annealing-style regularizer during training. Experiments on the MNIST dataset using both multilayer perceptron (MLP) and convolutional neural network (CNN) architectures show that N-ReLU achieves accuracy comparable to or slightly exceeding that of ReLU, LeakyReLU, PReLU, GELU, and RReLU at moderate noise levels (sigma = 0.05-0.10), with stable convergence and no dead neurons observed. These results demonstrate that lightweight Gaussian noise injection offers a simple yet effective mechanism to enhance optimization robustness without modifying network structures or introducing additional parameters.

Paper Structure

This paper contains 37 sections, 10 equations, 6 figures, 2 tables.

Figures (6)

  • Figure 1: Validation accuracy of the CNN model using six activations: ReLU, LeakyReLU, PReLU, GELU, RReLU, and N-ReLU with $\sigma \in \{0.05, 0.10, 0.20\}$. N-ReLU achieves convergence comparable to smooth deterministic activations.
  • Figure 2: Validation loss of the CNN model across all activations. All functions exhibit stable convergence, with N-ReLU maintaining comparable loss profiles to GELU and PReLU.
  • Figure 3: Validation accuracy of the MLP model across all activations. Moderate noise in N-ReLU ($\sigma = 0.05$–$0.10$) slightly improves generalization, while larger noise ($\sigma = 0.20$) slows convergence.
  • Figure 4: Validation loss of the MLP model across all activations. N-ReLU exhibits smooth loss decay consistent with stable optimization dynamics.
  • Figure 5: Sensitivity of validation accuracy to Gaussian noise standard deviation $\sigma$. A moderate noise level ($\sigma \approx 0.05$) yields the best balance between gradient exploration and stability.
  • ...and 1 more figures