Ultra-lightweight Neural Video Representation Compression
Ho Man Kwan, Tianhao Peng, Ge Gao, Fan Zhang, Mike Nilsson, Andrew Gower, David Bull
TL;DR
The paper tackles ultra-lightweight neural video compression by introducing NVRC-Lite, which combines multi-scale feature grids with an octree-based entropy model to drastically reduce computational cost (sub-10kMACs/pixel) while maintaining strong rate-distortion performance. It extends the NVRC framework by using a HiNeRV-based multi-grid representation and a fast, block-wise entropy coder, enabling end-to-end optimization over representation and entropy parameters. Empirical results on UVG and HEVC-B show substantial BD-rate improvements over the state-of-the-art lightweight INR codec C3 (about 21% PSNR and 23% MS-SSIM), along with significant encoding (8.4x) and decoding (2.5x) speedups. The work demonstrates a practical path toward real-time, low-complexity neural video compression and outlines directions for broader applicability and further efficiency gains.
Abstract
Recent works have demonstrated the viability of utilizing over-fitted implicit neural representations (INRs) as alternatives to autoencoder-based models for neural video compression. Among these INR-based video codecs, Neural Video Representation Compression (NVRC) was the first to adopt a fully end-to-end compression framework that compresses INRs, achieving state-of-the-art performance. Moreover, some recently proposed lightweight INRs have shown comparable performance to their baseline codecs with computational complexity lower than 10kMACs/pixel. In this work, we extend NVRC toward lightweight representations, and propose NVRC-Lite, which incorporates two key changes. Firstly, we integrated multi-scale feature grids into our lightweight neural representation, and the use of higher resolution grids significantly improves the performance of INRs at low complexity. Secondly, we address the issue that existing INRs typically leverage autoregressive models for entropy coding: these are effective but impractical due to their slow coding speed. In this work, we propose an octree-based context model for entropy coding high-dimensional feature grids, which accelerates the entropy coding module of the model. Our experimental results demonstrate that NVRC-Lite outperforms C3, one of the best lightweight INR-based video codecs, with up to 21.03% and 23.06% BD-rate savings when measured in PSNR and MS-SSIM, respectively, while achieving 8.4x encoding and 2.5x decoding speedup. The implementation of NVRC-Lite will be made available.
