Table of Contents
Fetching ...

Boosting Neural Video Representation via Online Structural Reparameterization

Ziyi Li, Qingyu Mao, Shuai Liu, Qilei Li, Fanyang Meng, Yongsheng Liang

TL;DR

The paper tackles the capacity bottleneck in Neural Video Representation (NVR) without increasing decoding cost. It introduces Online-RepNeRV, which employs an Enhanced Reparameterization Block (ERB) with multi-branch convolutions and an online parameter fusion strategy to boost training-time expressiveness; after training, branches are merged into a single kernel to maintain efficient inference. Empirical results show consistent PSNR/MS-SSIM gains over multiple baselines, especially in early training stages, and robust performance across datasets, with ablations highlighting the importance of branch design and online fusion. Overall, the work demonstrates that training-time architectural expansion can substantially improve NVR quality while preserving practical decoding efficiency and providing a flexible, plug-and-play approach for existing NVR pipelines.

Abstract

Neural Video Representation~(NVR) is a promising paradigm for video compression, showing great potential in improving video storage and transmission efficiency. While recent advances have made efforts in architectural refinements to improve representational capability, these methods typically involve complex designs, which may incur increased computational overhead and lack the flexibility to integrate into other frameworks. Moreover, the inherent limitation in model capacity restricts the expressiveness of NVR networks, resulting in a performance bottleneck. To overcome these limitations, we propose Online-RepNeRV, a NVR framework based on online structural reparameterization. Specifically, we propose a universal reparameterization block named ERB, which incorporates multiple parallel convolutional paths to enhance the model capacity. To mitigate the overhead, an online reparameterization strategy is adopted to dynamically fuse the parameters during training, and the multi-branch structure is equivalently converted into a single-branch structure after training. As a result, the additional computational and parameter complexity is confined to the encoding stage, without affecting the decoding efficiency. Extensive experiments on mainstream video datasets demonstrate that our method achieves an average PSNR gain of 0.37-2.7 dB over baseline methods, while maintaining comparable training time and decoding speed.

Boosting Neural Video Representation via Online Structural Reparameterization

TL;DR

The paper tackles the capacity bottleneck in Neural Video Representation (NVR) without increasing decoding cost. It introduces Online-RepNeRV, which employs an Enhanced Reparameterization Block (ERB) with multi-branch convolutions and an online parameter fusion strategy to boost training-time expressiveness; after training, branches are merged into a single kernel to maintain efficient inference. Empirical results show consistent PSNR/MS-SSIM gains over multiple baselines, especially in early training stages, and robust performance across datasets, with ablations highlighting the importance of branch design and online fusion. Overall, the work demonstrates that training-time architectural expansion can substantially improve NVR quality while preserving practical decoding efficiency and providing a flexible, plug-and-play approach for existing NVR pipelines.

Abstract

Neural Video Representation~(NVR) is a promising paradigm for video compression, showing great potential in improving video storage and transmission efficiency. While recent advances have made efforts in architectural refinements to improve representational capability, these methods typically involve complex designs, which may incur increased computational overhead and lack the flexibility to integrate into other frameworks. Moreover, the inherent limitation in model capacity restricts the expressiveness of NVR networks, resulting in a performance bottleneck. To overcome these limitations, we propose Online-RepNeRV, a NVR framework based on online structural reparameterization. Specifically, we propose a universal reparameterization block named ERB, which incorporates multiple parallel convolutional paths to enhance the model capacity. To mitigate the overhead, an online reparameterization strategy is adopted to dynamically fuse the parameters during training, and the multi-branch structure is equivalently converted into a single-branch structure after training. As a result, the additional computational and parameter complexity is confined to the encoding stage, without affecting the decoding efficiency. Extensive experiments on mainstream video datasets demonstrate that our method achieves an average PSNR gain of 0.37-2.7 dB over baseline methods, while maintaining comparable training time and decoding speed.

Paper Structure

This paper contains 11 sections, 13 equations, 7 figures, 4 tables.

Figures (7)

  • Figure 1: Performance curves of four baseline methods and their Online-Rep counterparts under the same training time. Online-Rep achieves higher PSNR and MS-SSIM in the early training stages and maintains a performance advantage.
  • Figure 2: Overall architecture of Online-RepNeRV. Taking the frame index as input, the model utilizes an MLP and multiple rep blocks to reconstruct the frame. Each rep block incorporates a multi-branch ERB, whose parameters are merged into a single set during online training, and the structure is converted into a single-branch form during inference.
  • Figure 3: Performance of different reparameterized blocks on NVR.
  • Figure 4: Visual results for Bosphorus and Beauty.
  • Figure 5: Pruning. Sparsity is the ratio of parameters pruned.
  • ...and 2 more figures