Robust Weight Initialization for Tanh Neural Networks with Fixed Point Analysis

Hyunwoo Lee; Hayoung Choi; Hyunju Kim

Robust Weight Initialization for Tanh Neural Networks with Fixed Point Analysis

Hyunwoo Lee, Hayoung Choi, Hyunju Kim

TL;DR

The paper tackles the problem of saturating activations and degraded signal propagation in deep tanh neural networks by introducing a fixed-point–informed weight initialization. It decomposes each weight matrix into a structured deterministic part and Gaussian noise, yielding an per-neuron effective slope near 1 and an approximately normal activation distribution across depth, guided by fixed-point analysis of $\tanh(a x)$. The authors derive the explicit initialization $\mathbf{W}^{\ell}=\mathbf{D}^{\ell}+\mathbf{Z}^{\ell}$ with $\mathbf{Z}^{\ell}\sim \mathcal{N}(0,\sigma_z^2)$, $\sigma_z=\alpha/\sqrt{N^{\ell-1}}$ ($\alpha=0.085$), and show that this choice stabilizes forward propagation without normalization. Empirical results on image classification benchmarks and Physics-Informed Neural Networks demonstrate improved robustness to network size and data efficiency relative to Xavier initialization, highlighting practical impact for deep tanh models in both standard and PDE-solving contexts.

Abstract

As a neural network's depth increases, it can improve generalization performance. However, training deep networks is challenging due to gradient and signal propagation issues. To address these challenges, extensive theoretical research and various methods have been introduced. Despite these advances, effective weight initialization methods for tanh neural networks remain insufficiently investigated. This paper presents a novel weight initialization method for neural networks with tanh activation function. Based on an analysis of the fixed points of the function $\tanh(ax)$, the proposed method aims to determine values of $a$ that mitigate activation saturation. A series of experiments on various classification datasets and physics-informed neural networks demonstrates that the proposed method outperforms Xavier initialization methods~(with or without normalization) in terms of robustness across different network sizes, data efficiency, and convergence speed. Code is available at https://github.com/1HyunwooLee/Tanh-Init

Robust Weight Initialization for Tanh Neural Networks with Fixed Point Analysis

TL;DR

. The authors derive the explicit initialization

with

(

), and show that this choice stabilizes forward propagation without normalization. Empirical results on image classification benchmarks and Physics-Informed Neural Networks demonstrate improved robustness to network size and data efficiency relative to Xavier initialization, highlighting practical impact for deep tanh models in both standard and PDE-solving contexts.

Abstract

, the proposed method aims to determine values of

that mitigate activation saturation. A series of experiments on various classification datasets and physics-informed neural networks demonstrates that the proposed method outperforms Xavier initialization methods~(with or without normalization) in terms of robustness across different network sizes, data efficiency, and convergence speed. Code is available at https://github.com/1HyunwooLee/Tanh-Init

Paper Structure (25 sections, 4 theorems, 16 equations, 18 figures, 4 tables)

This paper contains 25 sections, 4 theorems, 16 equations, 18 figures, 4 tables.

Introduction
Related works
Proposed Weight Initialization method
Theoretical motivation
The derivation of the proposed weight initialization method
Preventing activation saturation via appropriate $\sigma_z$ tuning
Experiments
Classification Task
Physics-Informed Neural Networks
Conclusion
Analysis of Signal Propagation
Proof of Lemma \ref{['lemma:fixedpoint1']}
Proof of Lemma \ref{['lemma:fixedpoint2']}
Proof of Proposition \ref{['prop:fixedpoint3']}
Proof of Corollary \ref{['cor:fixedpoint4']}
...and 10 more sections

Key Result

Lemma 1

For a fixed $a>0$, define the function $\phi_a: \mathbb{R} \to \mathbb{R}$ given as Then, there exists a fixed point $x^\ast$ of $\phi_a$. Furthermore,

Figures (18)

Figure 1: The difference between the maximum and minimum activation values at each layer when propagating $3,000$ input samples through a $10,000$-layer tanh FFNN, using Xavier initialization (left) and the proposed initialization (right). Experiments were conducted on distinct networks with $10,000$ hidden layers, each having the same number of nodes: $16$, $32$, $64$, or $128$.
Figure 2: The difference between the maximum and minimum activation values at each layer when propagating $3,000$ input samples through a $10,000$-layer tanh FFNN, using the proposed initialization with $\alpha$ set to $0.04, 0.085, 0.15,$ and $0.5$. Network with $10,000$ hidden layers, each with 32 nodes (left), and a network with alternating hidden layers of $64$ and $32$ nodes (right).
Figure 3: The activation values in the 1000th layer, with $32$ nodes per hidden layer, were analyzed using the proposed weight initialization method with $\sigma_z$ values of $0.0003$, $0.015$, $0.3$, and $3$. The analysis was conducted on $3,000$ input samples uniformly distributed within the range $[-1, 1]$.
Figure 4: Validation accuracy for a tanh FFNN with 50 hidden layers (32 nodes each). Xavier + BN and Xavier + LN represent Xavier initialization with Batch Normalization or Layer Normalization applied every 5 layers, respectively.
Figure 5: Validation accuracy and loss for a tanh FFNN with 60 hidden layers, where the number of nodes alternates between 32 and 16 across layers, repeated 30 times. The model was trained for 20 epochs on the MNIST and CIFAR-10 datasets.
...and 13 more figures

Theorems & Definitions (12)

Lemma 1
proof
Lemma 2
proof
Proposition 3
proof
Corollary 4
proof
proof
proof
...and 2 more

Robust Weight Initialization for Tanh Neural Networks with Fixed Point Analysis

TL;DR

Abstract

Robust Weight Initialization for Tanh Neural Networks with Fixed Point Analysis

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (18)

Theorems & Definitions (12)