Table of Contents
Fetching ...

How Learnable Grids Recover Fine Detail in Low Dimensions: A Neural Tangent Kernel Analysis of Multigrid Parametric Encodings

Samuel Audia, Soheil Feizi, Matthias Zwicker, Dinesh Manocha

TL;DR

The paper tackles spectral bias in coordinate-based neural networks by comparing Fourier feature encodings (FFE) and multigrid parametric encodings (MPE) through the neural tangent kernel (NTK) lens. It derives a finite-width NTK for MPEs, proving a lower-bound increase in the eigenvalue spectrum that arises from the learnable grid, not just the embedding space, and contrasts this with FFEs whose gains stem solely from embedding. Empirically, the authors demonstrate substantial improvements in high-frequency detail learning on 2D image regression (ImageNet synonym sets) and 3D implicit surface regression (Stanford meshes), with the MPE achieving markedly higher NTK spectra and better PSNR/MS-SSIM scores. The findings provide theoretical and practical justification for using grid-based encodings to mitigate spectral bias, with broad implications for graphics and scientific ML tasks. The work also outlines limitations and directions for future research, including exploring activation-function effects and optimizing interpolation kernels for domain-specific performance.

Abstract

Neural networks that map between low dimensional spaces are ubiquitous in computer graphics and scientific computing; however, in their naive implementation, they are unable to learn high frequency information. We present a comprehensive analysis comparing the two most common techniques for mitigating this spectral bias: Fourier feature encodings (FFE) and multigrid parametric encodings (MPE). FFEs are seen as the standard for low dimensional mappings, but MPEs often outperform them and learn representations with higher resolution and finer detail. FFE's roots in the Fourier transform, make it susceptible to aliasing if pushed too far, while MPEs, which use a learned grid structure, have no such limitation. To understand the difference in performance, we use the neural tangent kernel (NTK) to evaluate these encodings through the lens of an analogous kernel regression. By finding a lower bound on the smallest eigenvalue of the NTK, we prove that MPEs improve a network's performance through the structure of their grid and not their learnable embedding. This mechanism is fundamentally different from FFEs, which rely solely on their embedding space to improve performance. Results are empirically validated on a 2D image regression task using images taken from 100 synonym sets of ImageNet and 3D implicit surface regression on objects from the Stanford graphics dataset. Using peak signal-to-noise ratio (PSNR) and multiscale structural similarity (MS-SSIM) to evaluate how well fine details are learned, we show that the MPE increases the minimum eigenvalue by 8 orders of magnitude over the baseline and 2 orders of magnitude over the FFE. The increase in spectrum corresponds to a 15 dB (PSNR) / 0.65 (MS-SSIM) increase over baseline and a 12 dB (PSNR) / 0.33 (MS-SSIM) increase over the FFE.

How Learnable Grids Recover Fine Detail in Low Dimensions: A Neural Tangent Kernel Analysis of Multigrid Parametric Encodings

TL;DR

The paper tackles spectral bias in coordinate-based neural networks by comparing Fourier feature encodings (FFE) and multigrid parametric encodings (MPE) through the neural tangent kernel (NTK) lens. It derives a finite-width NTK for MPEs, proving a lower-bound increase in the eigenvalue spectrum that arises from the learnable grid, not just the embedding space, and contrasts this with FFEs whose gains stem solely from embedding. Empirically, the authors demonstrate substantial improvements in high-frequency detail learning on 2D image regression (ImageNet synonym sets) and 3D implicit surface regression (Stanford meshes), with the MPE achieving markedly higher NTK spectra and better PSNR/MS-SSIM scores. The findings provide theoretical and practical justification for using grid-based encodings to mitigate spectral bias, with broad implications for graphics and scientific ML tasks. The work also outlines limitations and directions for future research, including exploring activation-function effects and optimizing interpolation kernels for domain-specific performance.

Abstract

Neural networks that map between low dimensional spaces are ubiquitous in computer graphics and scientific computing; however, in their naive implementation, they are unable to learn high frequency information. We present a comprehensive analysis comparing the two most common techniques for mitigating this spectral bias: Fourier feature encodings (FFE) and multigrid parametric encodings (MPE). FFEs are seen as the standard for low dimensional mappings, but MPEs often outperform them and learn representations with higher resolution and finer detail. FFE's roots in the Fourier transform, make it susceptible to aliasing if pushed too far, while MPEs, which use a learned grid structure, have no such limitation. To understand the difference in performance, we use the neural tangent kernel (NTK) to evaluate these encodings through the lens of an analogous kernel regression. By finding a lower bound on the smallest eigenvalue of the NTK, we prove that MPEs improve a network's performance through the structure of their grid and not their learnable embedding. This mechanism is fundamentally different from FFEs, which rely solely on their embedding space to improve performance. Results are empirically validated on a 2D image regression task using images taken from 100 synonym sets of ImageNet and 3D implicit surface regression on objects from the Stanford graphics dataset. Using peak signal-to-noise ratio (PSNR) and multiscale structural similarity (MS-SSIM) to evaluate how well fine details are learned, we show that the MPE increases the minimum eigenvalue by 8 orders of magnitude over the baseline and 2 orders of magnitude over the FFE. The increase in spectrum corresponds to a 15 dB (PSNR) / 0.65 (MS-SSIM) increase over baseline and a 12 dB (PSNR) / 0.33 (MS-SSIM) increase over the FFE.

Paper Structure

This paper contains 14 sections, 1 theorem, 15 equations, 14 figures, 4 tables.

Key Result

Theorem 1

Given a dataset $\mathbf{X}$, with $n$ samples, the corresponding neural tangent kernel for a MLP network, and the neural tangent kernel for the same dataset and MLP composed with a MPE. The $i^{th}$ eigenvalue, sorted in descending order, of each kernel follows $\lambda_i^{MLP} \leq \lambda_i^{MLP}

Figures (14)

  • Figure 1: The above figure shows an example of the multigrid parametric encoding (MPE). The sample location (blue dot) is mapped to the surrounding grid cells (blue and green squares). The grid contains $k$ learnable scalars at each intersection point. Bilinear interpolation is performed on these learnable parameters independently. All learnable parameters are then concatenated with the origin x and y coordinate before being passed to the network.
  • Figure 2: This plot isolates the improvements of the MPE to the learnable grid. The NTK is computed for a 2D image regression on random images from 100 synonym sets in ImageNet (See the Experiments and Results section for more details). The solid line and dashed line are the spectrum at the end and the middle of training, respectively. MPE (No Grid) is the NTK of the MPE without the contributions of $K_{MPE}$ and is purely for theoretical analysis. Without the grid, the spectrum is barely above the baseline. With the grid, the spectrum is 8 magnitudes higher.
  • Figure 3: We compare performance of different encodings on image regression. We show the ground truth image (leftmost) along with a network with no encoding (top left), 3 configurations of the Fourier feature encoding (FFE), and two configurations of the multigrid parametric encoding (MPE) (see Table \ref{['tab:scaling-params']}). No encoding produces a blurred image, but as we add encodings, the finer details start to be resolved. An increase in detail is seen in the FFEs, while the MPEs perform well with both the coarse and the fine grid.
  • Figure 4: The NTK eigenvalue spectrum is compared for the cases found in Figure \ref{['fig:image-results']} and Table \ref{['tab:scaling-params']}. All encodings perform better than the baseline. The left plot shows the comparison of the baseline and the FFEs. The middle plot shows the same for the baseline and the MPEs. The right blot then compares the high frequency FFE to both MPE. The FFEs seem to saturate, while the fine grid MPE gives the best performance. These trends are backed by the PSNR values reported in the table. Interestingly, the coarse MPE crosses over the FFE, giving it strong performance early in training but allowing for higher PSNR values in the FFEs at the end of training.
  • Figure 5: This figure compares the mean of the eigenvalue spectra of different encodings across randomly sampled images from 100 synonym sets in ImageNet and three 3D meshes on OccupancyNet. The dashed line shows the mean spectra at the midpoint of training. We find that tuned encodings have fairly regular performance: the MPE outperforms the FFE, which outperforms the baseline. This result shows that the spectrum is stable across different images and domains.
  • ...and 9 more figures

Theorems & Definitions (2)

  • Theorem 1
  • proof