Improved weight initialization for deep and narrow feedforward neural network
Hyunwoo Lee, Yunho Kim, Seung Yeop Yang, Hayoung Choi
TL;DR
This paper tackles the dying ReLU problem in extremely deep and narrow FFNNs by introducing a fully deterministic weight initialization. The method constructs a weight matrix using a QR-based orthogonalization of a perturbed all-ones matrix, enabling property guarantees such as orthogonality and balanced row/column sums. The authors prove key properties and demonstrate depth, width, and activation independence, with an algorithmic construction that scales to large networks. Empirical results on MNIST, Fashion-MNIST, and select tabular datasets show improved convergence and higher validation accuracy in deep and narrow architectures compared to multiple baselines, indicating practical robustness and batch-normalization-free training.
Abstract
Appropriate weight initialization settings, along with the ReLU activation function, have become cornerstones of modern deep learning, enabling the training and deployment of highly effective and efficient neural network models across diverse areas of artificial intelligence. The problem of \textquotedblleft dying ReLU," where ReLU neurons become inactive and yield zero output, presents a significant challenge in the training of deep neural networks with ReLU activation function. Theoretical research and various methods have been introduced to address the problem. However, even with these methods and research, training remains challenging for extremely deep and narrow feedforward networks with ReLU activation function. In this paper, we propose a novel weight initialization method to address this issue. We establish several properties of our initial weight matrix and demonstrate how these properties enable the effective propagation of signal vectors. Through a series of experiments and comparisons with existing methods, we demonstrate the effectiveness of the novel initialization method.
