Robust Weight Initialization for Tanh Neural Networks with Fixed Point Analysis
Hyunwoo Lee, Hayoung Choi, Hyunju Kim
TL;DR
The paper tackles the problem of saturating activations and degraded signal propagation in deep tanh neural networks by introducing a fixed-point–informed weight initialization. It decomposes each weight matrix into a structured deterministic part and Gaussian noise, yielding an per-neuron effective slope near 1 and an approximately normal activation distribution across depth, guided by fixed-point analysis of $\tanh(a x)$. The authors derive the explicit initialization $\mathbf{W}^{\ell}=\mathbf{D}^{\ell}+\mathbf{Z}^{\ell}$ with $\mathbf{Z}^{\ell}\sim \mathcal{N}(0,\sigma_z^2)$, $\sigma_z=\alpha/\sqrt{N^{\ell-1}}$ ($\alpha=0.085$), and show that this choice stabilizes forward propagation without normalization. Empirical results on image classification benchmarks and Physics-Informed Neural Networks demonstrate improved robustness to network size and data efficiency relative to Xavier initialization, highlighting practical impact for deep tanh models in both standard and PDE-solving contexts.
Abstract
As a neural network's depth increases, it can improve generalization performance. However, training deep networks is challenging due to gradient and signal propagation issues. To address these challenges, extensive theoretical research and various methods have been introduced. Despite these advances, effective weight initialization methods for tanh neural networks remain insufficiently investigated. This paper presents a novel weight initialization method for neural networks with tanh activation function. Based on an analysis of the fixed points of the function $\tanh(ax)$, the proposed method aims to determine values of $a$ that mitigate activation saturation. A series of experiments on various classification datasets and physics-informed neural networks demonstrates that the proposed method outperforms Xavier initialization methods~(with or without normalization) in terms of robustness across different network sizes, data efficiency, and convergence speed. Code is available at https://github.com/1HyunwooLee/Tanh-Init
