NSTR: Neural Spectral Transport Representation for Space-Varying Frequency Fields
Plein Versace
TL;DR
This work tackles the limitation of global, stationary spectral bases in implicit neural representations (INRs) by introducing Neural Spectral Transport Representation (NSTR). NSTR models a space-varying local spectrum $S(x)$ governed by a learnable frequency transport equation with PDE supervision, enabling a signal to be reconstructed from a small set of global frequencies via $f(x)=g_\phi\left(\sum_{i=1}^K S_i(x)\sin(\omega_i^T x + b_i)\right)$. The approach decouples long-range spectral structure from local variation, leading to parameter efficiency, faster convergence, and interpretable spectral flows, as demonstrated across 2D image, audio, 3D geometry, and NeRF tasks. By enforcing a structured spectral transport, NSTR achieves strong fidelity with fewer global frequencies and offers a principled path for modeling non-stationary spectra in INRs, potentially reshaping how spectral content is represented in neural fields.
Abstract
Implicit Neural Representations (INRs) have emerged as a powerful paradigm for representing signals such as images, audio, and 3D scenes. However, existing INR frameworks -- including MLPs with Fourier features, SIREN, and multiresolution hash grids -- implicitly assume a \textit{global and stationary} spectral basis. This assumption is fundamentally misaligned with real-world signals whose frequency characteristics vary significantly across space, exhibiting local high-frequency textures, smooth regions, and frequency drift phenomena. We propose \textbf{Neural Spectral Transport Representation (NSTR)}, the first INR framework that \textbf{explicitly models a spatially varying local frequency field}. NSTR introduces a learnable \emph{frequency transport equation}, a PDE that governs how local spectral compositions evolve across space. Given a learnable local spectrum field $S(x)$ and a frequency transport network $F_θ$ enforcing $\nabla S(x) \approx F_θ(x, S(x))$, NSTR reconstructs signals by spatially modulating a compact set of global sinusoidal bases. This formulation enables strong local adaptivity and offers a new level of interpretability via visualizing frequency flows. Experiments on 2D image regression, audio reconstruction, and implicit 3D geometry show that NSTR achieves significantly better accuracy-parameter trade-offs than SIREN, Fourier-feature MLPs, and Instant-NGP. NSTR requires fewer global frequencies, converges faster, and naturally explains signal structure through spectral transport fields. We believe NSTR opens a new direction in INR research by introducing explicit modeling of space-varying spectrum.
