Table of Contents
Fetching ...

FreSh: Frequency Shifting for Accelerated Neural Representation Learning

Adam Kania, Marko Mihajlovic, Sergey Prokudin, Jacek Tabor, Przemysław Spurek

Abstract

Implicit Neural Representations (INRs) have recently gained attention as a powerful approach for continuously representing signals such as images, videos, and 3D shapes using multilayer perceptrons (MLPs). However, MLPs are known to exhibit a low-frequency bias, limiting their ability to capture high-frequency details accurately. This limitation is typically addressed by incorporating high-frequency input embeddings or specialized activation layers. In this work, we demonstrate that these embeddings and activations are often configured with hyperparameters that perform well on average but are suboptimal for specific input signals under consideration, necessitating a costly grid search to identify optimal settings. Our key observation is that the initial frequency spectrum of an untrained model's output correlates strongly with the model's eventual performance on a given target signal. Leveraging this insight, we propose frequency shifting (or FreSh), a method that selects embedding hyperparameters to align the frequency spectrum of the model's initial output with that of the target signal. We show that this simple initialization technique improves performance across various neural representation methods and tasks, achieving results comparable to extensive hyperparameter sweeps but with only marginal computational overhead compared to training a single model with default hyperparameters.

FreSh: Frequency Shifting for Accelerated Neural Representation Learning

Abstract

Implicit Neural Representations (INRs) have recently gained attention as a powerful approach for continuously representing signals such as images, videos, and 3D shapes using multilayer perceptrons (MLPs). However, MLPs are known to exhibit a low-frequency bias, limiting their ability to capture high-frequency details accurately. This limitation is typically addressed by incorporating high-frequency input embeddings or specialized activation layers. In this work, we demonstrate that these embeddings and activations are often configured with hyperparameters that perform well on average but are suboptimal for specific input signals under consideration, necessitating a costly grid search to identify optimal settings. Our key observation is that the initial frequency spectrum of an untrained model's output correlates strongly with the model's eventual performance on a given target signal. Leveraging this insight, we propose frequency shifting (or FreSh), a method that selects embedding hyperparameters to align the frequency spectrum of the model's initial output with that of the target signal. We show that this simple initialization technique improves performance across various neural representation methods and tasks, achieving results comparable to extensive hyperparameter sweeps but with only marginal computational overhead compared to training a single model with default hyperparameters.
Paper Structure (21 sections, 11 equations, 14 figures, 8 tables, 1 algorithm)

This paper contains 21 sections, 11 equations, 14 figures, 8 tables, 1 algorithm.

Figures (14)

  • Figure 1: The configuration of embeddings is crucial for the convergence speed. We train Siren with various embedding configurations ($\omega_0 \in [10, 200]$) for 5k steps on a Kodak image (top-left). The best grid-search found model ($\omega_0=90$), the FreSh configuration ($\omega_0=110$) and the baseline ($\omega_0=30$) are marked with diamonds (bottom-left). The optimal and FreSh configurations (top and middle rows) lead to sharper details, such as the number on the sail, compared with the baseline (bottom row). Even though Siren uses a frequency embedding, the baseline is blurry due to low frequency bias. Note how the sizes of uniformly colored areas in the output at step 0 indicate the size of image features the network can easily learn - this observation is pivotal for FreSh.
  • Figure 2: Example workflow of FreSh when applied to Siren and a high-frequency Kodak image. First, the image and outputs from various model configurations undergo a Discrete Fourier Transform (DFT). The Fourier coefficients of the same degree are then summed to produce the image spectrum (bottom-left). The model spectra are compared with the dataset spectrum using the Wasserstein distance $\mathcal{W}$ (bottom-middle), with only the configuration at the global minimum, highlighted by a diamond, used for training. Note that the Wasserstein distance follows a smooth trend with a distinct global minimum, indicating stable and predictable behavior.
  • Figure 3: Wasserstein distance for selected image and NeRF datasets across different Siren configurations. It follows a smooth trend with a distinct global minimum, indicating stable and predictable behavior. Shaded area represents the 95% confidence interval.
  • Figure 4: Example model outputs for image modeling (top) and NeRF (bottom). FreSh representations are better at modeling high-frequency details such as text or ropes. For additional examples, see \ref{['app:experiments']}.
  • Figure 5: Mean PSNR and SSIM values during training on 50 images (averaged over 3 seeds). FreSh improves the final performance and speeds up convergence. Dotted lines indicate the final results of the baseline model. Shaded area is the 99% confidence interval.
  • ...and 9 more figures