Table of Contents
Fetching ...

MNeRV: A Multilayer Neural Representation for Videos

Qingling Chang, Haohui Yu, Shuxuan Fu, Zhiqiang Zeng, Chuangquan Chen

TL;DR

MNeRV introduces a multilayer neural representation for videos by expanding both the encoder and decoder with M-Encoder and M-Decoder and by using MNeRV blocks to balance parameter allocation across layers. The approach, enhanced by a Global Response Normalization (GRN) in the encoder and a streamlined header design, achieves higher reconstruction quality with fewer parameters, and demonstrates strong performance on downstream tasks such as video compression, restoration, and interpolation. Across UVG, DAVIS, REDS, and Bunny datasets, MNeRV consistently outperforms prior NeRV-based methods, with ablations validating the benefits of increased depth, the GRN layer, and the revised loss. The work provides a practical, efficient implicit video representation suitable for high-quality decoding and broad applicability in video processing pipelines.

Abstract

As a novel video representation method, Neural Representations for Videos (NeRV) has shown great potential in the fields of video compression, video restoration, and video interpolation. In the process of representing videos using NeRV, each frame corresponds to an embedding, which is then reconstructed into a video frame sequence after passing through a small number of decoding layers (E-NeRV, HNeRV, etc.). However, this small number of decoding layers can easily lead to the problem of redundant model parameters due to the large proportion of parameters in a single decoding layer, which greatly restricts the video regression ability of neural network models. In this paper, we propose a multilayer neural representation for videos (MNeRV) and design a new decoder M-Decoder and its matching encoder M-Encoder. MNeRV has more encoding and decoding layers, which effectively alleviates the problem of redundant model parameters caused by too few layers. In addition, we design MNeRV blocks to perform more uniform and effective parameter allocation between decoding layers. In the field of video regression reconstruction, we achieve better reconstruction quality (+4.06 PSNR) with fewer parameters. Finally, we showcase MNeRV performance in downstream tasks such as video restoration and video interpolation. The source code of MNeRV is available at https://github.com/Aaronbtb/MNeRV.

MNeRV: A Multilayer Neural Representation for Videos

TL;DR

MNeRV introduces a multilayer neural representation for videos by expanding both the encoder and decoder with M-Encoder and M-Decoder and by using MNeRV blocks to balance parameter allocation across layers. The approach, enhanced by a Global Response Normalization (GRN) in the encoder and a streamlined header design, achieves higher reconstruction quality with fewer parameters, and demonstrates strong performance on downstream tasks such as video compression, restoration, and interpolation. Across UVG, DAVIS, REDS, and Bunny datasets, MNeRV consistently outperforms prior NeRV-based methods, with ablations validating the benefits of increased depth, the GRN layer, and the revised loss. The work provides a practical, efficient implicit video representation suitable for high-quality decoding and broad applicability in video processing pipelines.

Abstract

As a novel video representation method, Neural Representations for Videos (NeRV) has shown great potential in the fields of video compression, video restoration, and video interpolation. In the process of representing videos using NeRV, each frame corresponds to an embedding, which is then reconstructed into a video frame sequence after passing through a small number of decoding layers (E-NeRV, HNeRV, etc.). However, this small number of decoding layers can easily lead to the problem of redundant model parameters due to the large proportion of parameters in a single decoding layer, which greatly restricts the video regression ability of neural network models. In this paper, we propose a multilayer neural representation for videos (MNeRV) and design a new decoder M-Decoder and its matching encoder M-Encoder. MNeRV has more encoding and decoding layers, which effectively alleviates the problem of redundant model parameters caused by too few layers. In addition, we design MNeRV blocks to perform more uniform and effective parameter allocation between decoding layers. In the field of video regression reconstruction, we achieve better reconstruction quality (+4.06 PSNR) with fewer parameters. Finally, we showcase MNeRV performance in downstream tasks such as video restoration and video interpolation. The source code of MNeRV is available at https://github.com/Aaronbtb/MNeRV.
Paper Structure (19 sections, 3 equations, 12 figures, 8 tables)

This paper contains 19 sections, 3 equations, 12 figures, 8 tables.

Figures (12)

  • Figure 1: Architecture of multilayer neural representation for videos.
  • Figure 2: a) Architecture of MNeRV for 640$\times$1280. The M-Encoder consists of seven convnext blocks and the M-Decoder consists of seven MNeRV blocks. The step size of their encoding and decoding are both 5,2,2,2,2,2. b) We introduce the GRN layer to M-Encoder. c) We demonstrate the de-redundancy design in header layer. d) We show the composition of the MNeRN block.
  • Figure 3: The parameter distribution of each layer in NeRV, HNeRV, and MNeRV. Note that in “other”, both NeRV and HNeRV perform down-sampling on frame embed, while MNeRV does not have this part.
  • Figure 4: Visualization of video neural representations. On the left, we show the original frames. On the right, we compare NeRV, HNeRV, and MNeRV for 3 patches.
  • Figure 5: Compression results on loading dataset
  • ...and 7 more figures