Table of Contents
Fetching ...

Ultra-lightweight Neural Video Representation Compression

Ho Man Kwan, Tianhao Peng, Ge Gao, Fan Zhang, Mike Nilsson, Andrew Gower, David Bull

TL;DR

The paper tackles ultra-lightweight neural video compression by introducing NVRC-Lite, which combines multi-scale feature grids with an octree-based entropy model to drastically reduce computational cost (sub-10kMACs/pixel) while maintaining strong rate-distortion performance. It extends the NVRC framework by using a HiNeRV-based multi-grid representation and a fast, block-wise entropy coder, enabling end-to-end optimization over representation and entropy parameters. Empirical results on UVG and HEVC-B show substantial BD-rate improvements over the state-of-the-art lightweight INR codec C3 (about 21% PSNR and 23% MS-SSIM), along with significant encoding (8.4x) and decoding (2.5x) speedups. The work demonstrates a practical path toward real-time, low-complexity neural video compression and outlines directions for broader applicability and further efficiency gains.

Abstract

Recent works have demonstrated the viability of utilizing over-fitted implicit neural representations (INRs) as alternatives to autoencoder-based models for neural video compression. Among these INR-based video codecs, Neural Video Representation Compression (NVRC) was the first to adopt a fully end-to-end compression framework that compresses INRs, achieving state-of-the-art performance. Moreover, some recently proposed lightweight INRs have shown comparable performance to their baseline codecs with computational complexity lower than 10kMACs/pixel. In this work, we extend NVRC toward lightweight representations, and propose NVRC-Lite, which incorporates two key changes. Firstly, we integrated multi-scale feature grids into our lightweight neural representation, and the use of higher resolution grids significantly improves the performance of INRs at low complexity. Secondly, we address the issue that existing INRs typically leverage autoregressive models for entropy coding: these are effective but impractical due to their slow coding speed. In this work, we propose an octree-based context model for entropy coding high-dimensional feature grids, which accelerates the entropy coding module of the model. Our experimental results demonstrate that NVRC-Lite outperforms C3, one of the best lightweight INR-based video codecs, with up to 21.03% and 23.06% BD-rate savings when measured in PSNR and MS-SSIM, respectively, while achieving 8.4x encoding and 2.5x decoding speedup. The implementation of NVRC-Lite will be made available.

Ultra-lightweight Neural Video Representation Compression

TL;DR

The paper tackles ultra-lightweight neural video compression by introducing NVRC-Lite, which combines multi-scale feature grids with an octree-based entropy model to drastically reduce computational cost (sub-10kMACs/pixel) while maintaining strong rate-distortion performance. It extends the NVRC framework by using a HiNeRV-based multi-grid representation and a fast, block-wise entropy coder, enabling end-to-end optimization over representation and entropy parameters. Empirical results on UVG and HEVC-B show substantial BD-rate improvements over the state-of-the-art lightweight INR codec C3 (about 21% PSNR and 23% MS-SSIM), along with significant encoding (8.4x) and decoding (2.5x) speedups. The work demonstrates a practical path toward real-time, low-complexity neural video compression and outlines directions for broader applicability and further efficiency gains.

Abstract

Recent works have demonstrated the viability of utilizing over-fitted implicit neural representations (INRs) as alternatives to autoencoder-based models for neural video compression. Among these INR-based video codecs, Neural Video Representation Compression (NVRC) was the first to adopt a fully end-to-end compression framework that compresses INRs, achieving state-of-the-art performance. Moreover, some recently proposed lightweight INRs have shown comparable performance to their baseline codecs with computational complexity lower than 10kMACs/pixel. In this work, we extend NVRC toward lightweight representations, and propose NVRC-Lite, which incorporates two key changes. Firstly, we integrated multi-scale feature grids into our lightweight neural representation, and the use of higher resolution grids significantly improves the performance of INRs at low complexity. Secondly, we address the issue that existing INRs typically leverage autoregressive models for entropy coding: these are effective but impractical due to their slow coding speed. In this work, we propose an octree-based context model for entropy coding high-dimensional feature grids, which accelerates the entropy coding module of the model. Our experimental results demonstrate that NVRC-Lite outperforms C3, one of the best lightweight INR-based video codecs, with up to 21.03% and 23.06% BD-rate savings when measured in PSNR and MS-SSIM, respectively, while achieving 8.4x encoding and 2.5x decoding speedup. The implementation of NVRC-Lite will be made available.

Paper Structure

This paper contains 9 sections, 2 equations, 4 figures, 3 tables.

Figures (4)

  • Figure 1: The proposed NVRC-Lite framework. It contains an INR (HiNeRVkwan2023hinerv in this case) with feature inputs to multiple blocks at different resolutions, and performs parameter coding by utilizing the proposed octree model for efficient entropy coding. Note that NVRC-Lite also follows NVRC kwan2024nvrc to perform parameter coding in a hierarchical manner (details omitted here).
  • Figure 2: The proposed octree-based entropy coding structure. Here, $t$ represents the feature index in the temporal dimension. We code both the odd and even indexed features in every coding step to reduce coding complexity.
  • Figure 3: Rate‚Äìdistortion curves on the UVG and HEVC‚ÄëB datasets.
  • Figure 4: Examples of visual comparison between C3 (left) and our method (right). Examples are from the UVG (top) and HEVC-B (bottom) datasets.