EV-NVC: Efficient Variable bitrate Neural Video Compression
Yongcun Hu, Yingzhen Zhai, Jixiang Luo, Wenrui Dai, Dell Zhang, Hongkai Xiong, Xuelong Li
TL;DR
This paper tackles the challenge of training variable-rate neural video codecs by introducing EV-NVC, a framework that combines a Piecewise Linear Sampler (PLS) for effective rate control with a Long-Short-Term Feature Fusion Module (LSTFFM) to integrate long- and short-term context. A multi-stage, mixed-precision training strategy is used to optimize learning and evaluate component contributions, while motion estimation relies on a pre-trained SpyNet. The key contributions include the PLS with four idx segments and specific hyperparameters, the LSTFFM architecture that fuses long-term references like $\hat{x}_{t-4}$ with short-term features, and an 18-stage training regimen that progressively shapes motion, reconstruction, and multi-frame losses. Experimental results show BD-rate reductions up to 30.56% versus HM-16.25 and competitive performance with VTM-17.0 across HEVC classes, with ablation confirming substantial gains from both PLS and LSTFFM. Overall, EV-NVC provides a scalable, open-source approach to variable-rate neural video compression that can operate efficiently across diverse devices and applications.
Abstract
Training neural video codec (NVC) with variable rate is a highly challenging task due to its complex training strategies and model structure. In this paper, we train an efficient variable bitrate neural video codec (EV-NVC) with the piecewise linear sampler (PLS) to improve the rate-distortion performance in high bitrate range, and the long-short-term feature fusion module (LSTFFM) to enhance the context modeling. Besides, we introduce mixed-precision training and discuss the different training strategies for each stage in detail to fully evaluate its effectiveness. Experimental results show that our approach reduces the BD-rate by 30.56% compared to HM-16.25 within low-delay mode.
