Table of Contents
Fetching ...

FANeRV: Frequency Separation and Augmentation based Neural Representation for Video

Li Yu, Zhihui Li, Chao Yao, Jimin Xiao, Moncef Gabbouj

TL;DR

FANeRV tackles the spectral bias limiting high-frequency recovery in implicit neural video representations by explicitly separating frame features into low- and high-frequency subbands via a Haar wavelet transform and then applying dedicated enhancement and fusion workflows. The Wavelet Frequency Upgrade Block (WFUB) combines a Frequency Separation Feature Boosting (FSFB) module for global, multi-scale context and a Time-Modulated Gated Feed-Forward Network (TGFN) for temporal-aware refinement, with Convolutional Residual Enhancement Blocks (CREB) to balance parameter distribution and preserve fine details. A hybrid loss, combining L1 and MS-SSIM with a frequency-domain constraint, further enforces both structural fidelity and high-frequency content. Across video regression, compression, interpolation, and inpainting, FANeRV achieves state-of-the-art or superior performance with the same model capacity, demonstrating the practicality of frequency-separation strategies for high-fidelity video representations.

Abstract

Neural representations for video (NeRV) have gained considerable attention for their strong performance across various video tasks. However, existing NeRV methods often struggle to capture fine spatial details, resulting in vague reconstructions. In this paper, we present a Frequency Separation and Augmentation based Neural Representation for video (FANeRV), which addresses these limitations with its core Wavelet Frequency Upgrade Block. This block explicitly separates input frames into high and low-frequency components using discrete wavelet transform, followed by targeted enhancement using specialized modules. Finally, a specially designed gated network effectively fuses these frequency components for optimal reconstruction. Additionally, convolutional residual enhancement blocks are integrated into the later stages of the network to balance parameter distribution and improve the restoration of high-frequency details. Experimental results demonstrate that FANeRV significantly improves reconstruction performance and excels in multiple tasks, including video compression, inpainting, and interpolation, outperforming existing NeRV methods.

FANeRV: Frequency Separation and Augmentation based Neural Representation for Video

TL;DR

FANeRV tackles the spectral bias limiting high-frequency recovery in implicit neural video representations by explicitly separating frame features into low- and high-frequency subbands via a Haar wavelet transform and then applying dedicated enhancement and fusion workflows. The Wavelet Frequency Upgrade Block (WFUB) combines a Frequency Separation Feature Boosting (FSFB) module for global, multi-scale context and a Time-Modulated Gated Feed-Forward Network (TGFN) for temporal-aware refinement, with Convolutional Residual Enhancement Blocks (CREB) to balance parameter distribution and preserve fine details. A hybrid loss, combining L1 and MS-SSIM with a frequency-domain constraint, further enforces both structural fidelity and high-frequency content. Across video regression, compression, interpolation, and inpainting, FANeRV achieves state-of-the-art or superior performance with the same model capacity, demonstrating the practicality of frequency-separation strategies for high-fidelity video representations.

Abstract

Neural representations for video (NeRV) have gained considerable attention for their strong performance across various video tasks. However, existing NeRV methods often struggle to capture fine spatial details, resulting in vague reconstructions. In this paper, we present a Frequency Separation and Augmentation based Neural Representation for video (FANeRV), which addresses these limitations with its core Wavelet Frequency Upgrade Block. This block explicitly separates input frames into high and low-frequency components using discrete wavelet transform, followed by targeted enhancement using specialized modules. Finally, a specially designed gated network effectively fuses these frequency components for optimal reconstruction. Additionally, convolutional residual enhancement blocks are integrated into the later stages of the network to balance parameter distribution and improve the restoration of high-frequency details. Experimental results demonstrate that FANeRV significantly improves reconstruction performance and excels in multiple tasks, including video compression, inpainting, and interpolation, outperforming existing NeRV methods.

Paper Structure

This paper contains 18 sections, 16 equations, 4 figures, 5 tables.

Figures (4)

  • Figure 1: Architecture of the proposed FANeRV. FANeRV integrates the Wavelet Frequency Upgrade Block (WFUB) and the convolutional residual enhancement block to enhance detail reconstruction. The WFUB comprises a Separation Feature Boosting (FSFB) module and a Time-Modulated Gated Feed-Forward Network (TGFN).
  • Figure 2: The visualization comparison results are arranged in a top-to-bottom format, highlighting video reconstruction, interpolation, central inpainting (Mask-C), and dispersed inpainting (Mask-S) tasks on the DAVIS validation and UVG datasets. The first column shows the ground truth, followed by the baseline results from NeRV, HNeRV, and our method. The red numbers represent the corresponding PSNR values.
  • Figure 3: The rate-distortion curve on UVG in terms of PSNR and MS-SSIM.
  • Figure 4: Parameter distribution across decoder blocks in various models.