FANeRV: Frequency Separation and Augmentation based Neural Representation for Video
Li Yu, Zhihui Li, Chao Yao, Jimin Xiao, Moncef Gabbouj
TL;DR
FANeRV tackles the spectral bias limiting high-frequency recovery in implicit neural video representations by explicitly separating frame features into low- and high-frequency subbands via a Haar wavelet transform and then applying dedicated enhancement and fusion workflows. The Wavelet Frequency Upgrade Block (WFUB) combines a Frequency Separation Feature Boosting (FSFB) module for global, multi-scale context and a Time-Modulated Gated Feed-Forward Network (TGFN) for temporal-aware refinement, with Convolutional Residual Enhancement Blocks (CREB) to balance parameter distribution and preserve fine details. A hybrid loss, combining L1 and MS-SSIM with a frequency-domain constraint, further enforces both structural fidelity and high-frequency content. Across video regression, compression, interpolation, and inpainting, FANeRV achieves state-of-the-art or superior performance with the same model capacity, demonstrating the practicality of frequency-separation strategies for high-fidelity video representations.
Abstract
Neural representations for video (NeRV) have gained considerable attention for their strong performance across various video tasks. However, existing NeRV methods often struggle to capture fine spatial details, resulting in vague reconstructions. In this paper, we present a Frequency Separation and Augmentation based Neural Representation for video (FANeRV), which addresses these limitations with its core Wavelet Frequency Upgrade Block. This block explicitly separates input frames into high and low-frequency components using discrete wavelet transform, followed by targeted enhancement using specialized modules. Finally, a specially designed gated network effectively fuses these frequency components for optimal reconstruction. Additionally, convolutional residual enhancement blocks are integrated into the later stages of the network to balance parameter distribution and improve the restoration of high-frequency details. Experimental results demonstrate that FANeRV significantly improves reconstruction performance and excels in multiple tasks, including video compression, inpainting, and interpolation, outperforming existing NeRV methods.
