Table of Contents
Fetching ...

STAF: Sinusoidal Trainable Activation Functions for Implicit Neural Representation

Alireza Morsali, MohammadJavad Vaez, Mohammadhossein Soltani, Amirhossein Kazerouni, Babak Taati, Morteza Mohammad-Noori

TL;DR

STAF introduces a trainable sinusoidal activation for implicit neural representations to overcome spectral bias and enable high-frequency detail capture. By parameterizing a Fourier-series activation $ ho^*(x)=\sum_{i=1}^{\tau} C_i \sin(\Omega_i x+\Phi_i)$, trained per-layer, STAF expands the network's frequency support and achieves Kronecker-equivalent representations that surpass fixed activations like SIREN. Neural Tangent Kernel analysis reveals STAF yields richer eigenfunctions and larger eigenvalues, aligning with faster convergence and improved learning of high-frequency content. Empirically, STAF achieves state-of-the-art PSNR/SSIM across signal representation, inverse problems, and NeRF tasks with favorable convergence and modest parameter overhead; code is publicly available.

Abstract

Implicit Neural Representations (INRs) have emerged as a powerful framework for modeling continuous signals. The spectral bias of ReLU-based networks is a well-established limitation, restricting their ability to capture fine-grained details in target signals. While previous works have attempted to mitigate this issue through frequency-based encodings or architectural modifications, these approaches often introduce additional complexity and do not fully address the underlying challenge of learning high-frequency components efficiently. We introduce Sinusoidal Trainable Activation Functions (STAF), designed to directly tackle this limitation by enabling networks to adaptively learn and represent complex signals with higher precision and efficiency. STAF inherently modulates its frequency components, allowing for self-adaptive spectral learning. This capability significantly improves convergence speed and expressivity, making STAF highly effective for both signal representations and inverse problems. Through extensive evaluations across a range of tasks, including signal representation (shape, image, audio) and inverse problems (super-resolution, denoising), as well as neural radiance fields (NeRF), we demonstrate that STAF consistently outperforms state-of-the-art methods in accuracy and reconstruction fidelity. These results establish STAF as a robust solution to spectral bias and the capacity--convergence tradeoff, with broad applicability in computer vision and graphics. Our codebase is publicly accessible at https://github.com/AlirezaMorsali/STAF.

STAF: Sinusoidal Trainable Activation Functions for Implicit Neural Representation

TL;DR

STAF introduces a trainable sinusoidal activation for implicit neural representations to overcome spectral bias and enable high-frequency detail capture. By parameterizing a Fourier-series activation , trained per-layer, STAF expands the network's frequency support and achieves Kronecker-equivalent representations that surpass fixed activations like SIREN. Neural Tangent Kernel analysis reveals STAF yields richer eigenfunctions and larger eigenvalues, aligning with faster convergence and improved learning of high-frequency content. Empirically, STAF achieves state-of-the-art PSNR/SSIM across signal representation, inverse problems, and NeRF tasks with favorable convergence and modest parameter overhead; code is publicly available.

Abstract

Implicit Neural Representations (INRs) have emerged as a powerful framework for modeling continuous signals. The spectral bias of ReLU-based networks is a well-established limitation, restricting their ability to capture fine-grained details in target signals. While previous works have attempted to mitigate this issue through frequency-based encodings or architectural modifications, these approaches often introduce additional complexity and do not fully address the underlying challenge of learning high-frequency components efficiently. We introduce Sinusoidal Trainable Activation Functions (STAF), designed to directly tackle this limitation by enabling networks to adaptively learn and represent complex signals with higher precision and efficiency. STAF inherently modulates its frequency components, allowing for self-adaptive spectral learning. This capability significantly improves convergence speed and expressivity, making STAF highly effective for both signal representations and inverse problems. Through extensive evaluations across a range of tasks, including signal representation (shape, image, audio) and inverse problems (super-resolution, denoising), as well as neural radiance fields (NeRF), we demonstrate that STAF consistently outperforms state-of-the-art methods in accuracy and reconstruction fidelity. These results establish STAF as a robust solution to spectral bias and the capacity--convergence tradeoff, with broad applicability in computer vision and graphics. Our codebase is publicly accessible at https://github.com/AlirezaMorsali/STAF.

Paper Structure

This paper contains 34 sections, 13 theorems, 121 equations, 18 figures, 6 tables.

Key Result

Theorem 3.1

Consider a neural network as defined in Network with a sinusoidal trainable activation function (STAF) defined in STAF. Suppose for each $i$, $\Phi_i \sim U(-\pi, \pi)$. Furthermore, let $C_i$ be i.i.d. random variables with the following probability density function: and assume that $C_i$'s are independent of $\Omega_i$, $\boldsymbol{w}$, $\boldsymbol{x}$, and $\Phi_i$. Then, every post-activati

Figures (18)

  • Figure 1: Activation functions used in INRs plotted over the range [-1, 1]. STAF utilizes a parameterized Fourier series activation, offering flexible frequency-domain adaptation. SIREN employs a sinusoidal function, providing a periodic activation landscape. WIRE employs a complex Gabor wavelet activation, balancing spatial and frequency localization.
  • Figure 2: Activation maps of STAF, SIREN, and WIRE learned during image reconstruction.
  • Figure 3: (a) Reconstruction results by STAF, FINER, KAN, SIREN, and WIRE on the Cameraman image. (b) Corresponding PSNR curves across 300 training iterations.
  • Figure 4: Comparative visualization of image representation using STAF and other activation functions. The second row highlights representation errors, with brighter areas indicating higher errors. The Celtic image size is $128 \times 128$, and the second image from the DIV2K div2k dataset is downsampled by a factor of $1/4$ to $510 \times 339$ (Zoom in to view details.)
  • Figure 5: (a) The first five eigenfunctions of the empirical NTK of STAF ($\tau=2,5$), FINER, SIREN, and FFN. (b) The eigenvalue spectrum of the empirical NTK of the same models. Interestingly, the eigenvalue spectrum and the first five eigenfunctions for different $\tau$'s look very similar to each other, which shows that we can achieve on-par results even with fewer $\tau$ (or number of parameters), which is consistent with our result in Table \ref{['tab:tau_ablation']}. Also note that SIREN is similar to STAF with $\tau = 1$, except for the initialization scheme, as well as the use of trainable frequencies and phases. However, these differences result in a significant performance gap.
  • ...and 13 more figures

Theorems & Definitions (23)

  • Theorem 3.1
  • Theorem 5.1
  • Theorem 5.2
  • Theorem 5.3
  • Lemma 5.4
  • Theorem 7.1
  • Theorem 7.2
  • proof
  • Lemma 7.3
  • proof
  • ...and 13 more