Table of Contents
Fetching ...

A new initialisation to Control Gradients in Sinusoidal Neural network

Andrea Combette, Antoine Venaille, Nelly Pustelnik

TL;DR

The paper addresses spectral instability and gradient problems in deep sinusoidal networks used for implicit neural representations. It derives a closed-form initialization by enforcing a fixed-point pre-activation variance and unit gradient flow (sigma_g=1 and sigma_a=0), linking these choices to NTK dynamics and Fourier spectrum behavior. The proposed scheme stabilizes training with depth, reduces spurious high-frequency content, and improves generalization in function fitting, image/video reconstruction, and physics-informed tasks. Overall, the work connects initialization, training dynamics, and spectral properties in sine-activated networks, with broad implications beyond INR contexts.

Abstract

Proper initialisation strategy is of primary importance to mitigate gradient explosion or vanishing when training neural networks. Yet, the impact of initialisation parameters still lacks a precise theoretical understanding for several well-established architectures. Here, we propose a new initialisation for networks with sinusoidal activation functions such as \texttt{SIREN}, focusing on gradients control, their scaling with network depth, their impact on training and on generalization. To achieve this, we identify a closed-form expression for the initialisation of the parameters, differing from the original \texttt{SIREN} scheme. This expression is derived from fixed points obtained through the convergence of pre-activation distribution and the variance of Jacobian sequences. Controlling both gradients and targeting vanishing pre-activation helps preventing the emergence of inappropriate frequencies during estimation, thereby improving generalization. We further show that this initialisation strongly influences training dynamics through the Neural Tangent Kernel framework (NTK). Finally, we benchmark \texttt{SIREN} with the proposed initialisation against the original scheme and other baselines on function fitting and image reconstruction. The new initialisation consistently outperforms state-of-the-art methods across a wide range of reconstruction tasks, including those involving physics-informed neural networks.

A new initialisation to Control Gradients in Sinusoidal Neural network

TL;DR

The paper addresses spectral instability and gradient problems in deep sinusoidal networks used for implicit neural representations. It derives a closed-form initialization by enforcing a fixed-point pre-activation variance and unit gradient flow (sigma_g=1 and sigma_a=0), linking these choices to NTK dynamics and Fourier spectrum behavior. The proposed scheme stabilizes training with depth, reduces spurious high-frequency content, and improves generalization in function fitting, image/video reconstruction, and physics-informed tasks. Overall, the work connects initialization, training dynamics, and spectral properties in sine-activated networks, with broad implications beyond INR contexts.

Abstract

Proper initialisation strategy is of primary importance to mitigate gradient explosion or vanishing when training neural networks. Yet, the impact of initialisation parameters still lacks a precise theoretical understanding for several well-established architectures. Here, we propose a new initialisation for networks with sinusoidal activation functions such as \texttt{SIREN}, focusing on gradients control, their scaling with network depth, their impact on training and on generalization. To achieve this, we identify a closed-form expression for the initialisation of the parameters, differing from the original \texttt{SIREN} scheme. This expression is derived from fixed points obtained through the convergence of pre-activation distribution and the variance of Jacobian sequences. Controlling both gradients and targeting vanishing pre-activation helps preventing the emergence of inappropriate frequencies during estimation, thereby improving generalization. We further show that this initialisation strongly influences training dynamics through the Neural Tangent Kernel framework (NTK). Finally, we benchmark \texttt{SIREN} with the proposed initialisation against the original scheme and other baselines on function fitting and image reconstruction. The new initialisation consistently outperforms state-of-the-art methods across a wide range of reconstruction tasks, including those involving physics-informed neural networks.

Paper Structure

This paper contains 38 sections, 8 theorems, 85 equations, 21 figures.

Key Result

Theorem 3.1

Considering SIREN network described in equation eq:siren where, for some $c_w, c_b \in \mathbb{R}^+$, and for every layer $\ell\in \{2,\ldots,L\}$, the weight matrix ${\mathbf{W}}_\ell$ is initialized as a random matrix sampled from $\mathcal{U}(-c_w/\sqrt{N},c_w/\sqrt{N})$, ${\mathbf{W}}_1$ is samp where $\mathcal{W}_{0}$ is the principal real branch of the Lambert function. The sequence associat

Figures (21)

  • Figure 1: Generalization error over different problems averaged over different architecture depths for 1d, 2d and 3d multi-scaled function approximation. The results are displayed for different state-of-the-art architectures including the one proposed in this work (SIREN Proposed). See Appendix \ref{['sec:exp']} for details. In standard deviation of the error is colored in light gray.
  • Figure 2: Comparison of several INR architectures and initializations on an image‑fitting problem using an $L = 10$ hidden‑layer neural network of width $N = 256$. We train the model on a set $({\bm{x}}_i, y_i)_{i\in \mathbb{I}}$ where ${\bm{x}}_i$ is a location taken on a $\vert \mathbb{I} \vert= 128 \times128$ uniformly spaced grid on $\Omega = [-1,1]^2$ and $y_i$ is the associated image value at this location. The top row shows the fitted $128\times128$ image. The middle row shows the estimation on an augmented resolution ($512\times512$) to assess the model’s generalization and the last row provides a zoom on part of the image. In all case, we use ADAM optimizer with learning rate $10^{-4}$ for 10000 epochs. The state-of-the-art architecture considered in this experiment are: SIREN (see sitzmann2020), FINER (see finer), WIRE (see wire), Tanh(FX) with Fourier features and Xavier initialization (see tancik2020), and the traditional ReLU with Positional Encoding (see Nair2010). We used for the SIREN based architectures the previously discussed schemes. We observe that the proposed strategies (SIREN ($\sigma_a=0$ and $\sigma_a=1$) lead to significant improvement in the model estimation with respect to other methods. For instance, it preserves sharp features compared to other SOTA method such as Wire, Finer, that yields extremely poor results for deep neural networks.
  • Figure 3: Experimental standard deviation of the pre-activation distribution (left) and of the layer-wise Jacobian entries distribution (right), as a function of the parameters $(c_w, c_b)$. The plain and dashed black lines indicate the theoretical predictions for $\sigma_a=1$ and $\sigma_g = 1$, following Theorems \ref{['thm-activation-distribution']} and \ref{['thm-gradient-distribution']}, respectively. The black and red dots indicates the initialization provided in Proposition \ref{['prop:init']}, the Pytorch dots corresponds to the default weight and bias initialization, and the green dots to the Sitzmann initialization.
  • Figure 4: One-dimensional Fourier spectra of $\Psi_{\theta}$ for multiple depths $L \in \{4,8,16,32\}$, driving frequencies $w_0 \in \{100,1000\}$ (rows), and initialization schemes (columns). Each curve shows the magnitude of the discrete Fourier transform of $\Psi_{\theta}$ evaluated on an equispaced grid; colors encode the depth $L$. The red vertical line marks $w_0 / 2\pi$ which corresponds to the input frequency encoded by the first layers and the black vertical line marks $w_0$. The colored backgrounds group the different initializations (from left to right: proposed SIREN with $\sigma_a = 0$, SIREN with $\sigma_a = 1$, the initialization of sitzmann2020, and the default PyTorch initialization).
  • Figure 5: The first six eigenvectors ${\bm{v}}_0,\dots,{\bm{v}}_5$ of the NTK matrix ${\mathbf{K}}_{\theta_0}$, ordered by decreasing eigenvalue $\lambda_0 > \lambda_1 > \cdots > \lambda_5$. The NTK matrix was computed numerically on a uniform grid of $\vert \mathbb{I}\vert = 500$ points over the interval $\Omega = [-1,1]$ using a SIREN network of width $N=512$ and of depth $L=8$ and using $\omega_0 = 1$. The eigenvectors exhibit increasingly oscillatory behavior as the mode index grows, consistent with their interpretation as Fourier-like modes. This observation confirms the spectral structure predicted by our analysis and highlights the tendency of the NTK to prioritize low-frequency components associated with larger eigenvalues.
  • ...and 16 more figures

Theorems & Definitions (15)

  • Theorem 3.1: Pre-activation distribution of SIREN
  • Remark 3.1
  • Remark 3.2
  • Theorem 3.2: Jacobian distribution of SIREN
  • Proposition 3.1
  • Theorem : Restatement of Theorem \ref{['thm-activation-distribution']}
  • proof
  • Lemma A.1
  • proof
  • Lemma A.2
  • ...and 5 more