Table of Contents
Fetching ...

NVRC: Neural Video Representation Compression

Ho Man Kwan, Ge Gao, Fan Zhang, Andrew Gower, David Bull

TL;DR

NVRC introduces a fully end-to-end optimized INR-based video compression framework that hierarchically codes neural representation parameters, including feature grids, network layers, and entropy-model parameters. It combines enhanced feature-grid encoding with context-based entropy models and a dual-axis Gaussian parameter model, enabling efficient RD optimization via a staged training regime and alternating RD updates. Empirically, NVRC achieves substantial gains, including around 23% BD-rate savings against VVC VTM on UVG and competitive performance relative to INR baselines, marking the first INR-based codec to surpass VVC on a standard dataset. The work advances practical INR-based video compression by delivering a scalable, end-to-end RD-optimized pipeline, while noting encoder complexity and latency as areas for further improvement.

Abstract

Recent advances in implicit neural representation (INR)-based video coding have demonstrated its potential to compete with both conventional and other learning-based approaches. With INR methods, a neural network is trained to overfit a video sequence, with its parameters compressed to obtain a compact representation of the video content. However, although promising results have been achieved, the best INR-based methods are still out-performed by the latest standard codecs, such as VVC VTM, partially due to the simple model compression techniques employed. In this paper, rather than focusing on representation architectures as in many existing works, we propose a novel INR-based video compression framework, Neural Video Representation Compression (NVRC), targeting compression of the representation. Based on the novel entropy coding and quantization models proposed, NVRC, for the first time, is able to optimize an INR-based video codec in a fully end-to-end manner. To further minimize the additional bitrate overhead introduced by the entropy models, we have also proposed a new model compression framework for coding all the network, quantization and entropy model parameters hierarchically. Our experiments show that NVRC outperforms many conventional and learning-based benchmark codecs, with a 24% average coding gain over VVC VTM (Random Access) on the UVG dataset, measured in PSNR. As far as we are aware, this is the first time an INR-based video codec achieving such performance. The implementation of NVRC will be released.

NVRC: Neural Video Representation Compression

TL;DR

NVRC introduces a fully end-to-end optimized INR-based video compression framework that hierarchically codes neural representation parameters, including feature grids, network layers, and entropy-model parameters. It combines enhanced feature-grid encoding with context-based entropy models and a dual-axis Gaussian parameter model, enabling efficient RD optimization via a staged training regime and alternating RD updates. Empirically, NVRC achieves substantial gains, including around 23% BD-rate savings against VVC VTM on UVG and competitive performance relative to INR baselines, marking the first INR-based codec to surpass VVC on a standard dataset. The work advances practical INR-based video compression by delivering a scalable, end-to-end RD-optimized pipeline, while noting encoder complexity and latency as areas for further improvement.

Abstract

Recent advances in implicit neural representation (INR)-based video coding have demonstrated its potential to compete with both conventional and other learning-based approaches. With INR methods, a neural network is trained to overfit a video sequence, with its parameters compressed to obtain a compact representation of the video content. However, although promising results have been achieved, the best INR-based methods are still out-performed by the latest standard codecs, such as VVC VTM, partially due to the simple model compression techniques employed. In this paper, rather than focusing on representation architectures as in many existing works, we propose a novel INR-based video compression framework, Neural Video Representation Compression (NVRC), targeting compression of the representation. Based on the novel entropy coding and quantization models proposed, NVRC, for the first time, is able to optimize an INR-based video codec in a fully end-to-end manner. To further minimize the additional bitrate overhead introduced by the entropy models, we have also proposed a new model compression framework for coding all the network, quantization and entropy model parameters hierarchically. Our experiments show that NVRC outperforms many conventional and learning-based benchmark codecs, with a 24% average coding gain over VVC VTM (Random Access) on the UVG dataset, measured in PSNR. As far as we are aware, this is the first time an INR-based video codec achieving such performance. The implementation of NVRC will be released.
Paper Structure (22 sections, 11 equations, 6 figures, 8 tables)

This paper contains 22 sections, 11 equations, 6 figures, 8 tables.

Figures (6)

  • Figure 1: Comparison between the output from HiNeRV kwan2023hinerv and the proposed NVRC. The image is from UVG dataset (Jockey/ReadySetGo sequence) mercat2020uvg.
  • Figure 2: In NVRC, the parameters are encoded in a hierarchical structure, where (Middle-left) per-block quantization scales and (bottom-left) context-based model are utilized for encoding feature grids, and (Middle-right and bottom-right) per-axis quantization scales and dual-axis Gaussian model are applied for encoding network layer parameters.
  • Figure 3: Average rate quality curves of various tested codecs on the UVG dataset mercat2020uvg.
  • Figure 4: Average rate quality curves of various tested codecs on the MCL-JCV dataset wang2016mcl.
  • Figure 5: Average rate quality curves of various tested codecs on the JVET-CTC Class B datasets boyce2018jvet.
  • ...and 1 more figures