Table of Contents
Fetching ...

NERV++: An Enhanced Implicit Neural Video Representation

Ahmed Ghorbel, Wassim Hamidouche, Luce Morin

TL;DR

This work introduces NeRV++, an enhanced implicit neural video representation that strengthens the decoder with separable conv2d residual blocks and a bilinear interpolation skip to improve rate-distortion for INR-based video compression. The framework supports time-continuous video representation via a time-conditioned MLP and SCRB-based decoding, followed by model pruning and 8-bit quantization to achieve compact, efficient encodings. Empirical results on UVG, MCL_JVC, and Bunny show competitive RD performance, with notable gains over prior INR methods, while highlighting tradeoffs in decoding latency. The study demonstrates that INR-based video codecs can approach autoencoder-based performance, while outlining avenues for further efficiency improvements in entropy modeling and hardware-aware deployment.

Abstract

Neural fields, also known as implicit neural representations (INRs), have shown a remarkable capability of representing, generating, and manipulating various data types, allowing for continuous data reconstruction at a low memory footprint. Though promising, INRs applied to video compression still need to improve their rate-distortion performance by a large margin, and require a huge number of parameters and long training iterations to capture high-frequency details, limiting their wider applicability. Resolving this problem remains a quite challenging task, which would make INRs more accessible in compression tasks. We take a step towards resolving these shortcomings by introducing neural representations for videos NeRV++, an enhanced implicit neural video representation, as more straightforward yet effective enhancement over the original NeRV decoder architecture, featuring separable conv2d residual blocks (SCRBs) that sandwiches the upsampling block (UB), and a bilinear interpolation skip layer for improved feature representation. NeRV++ allows videos to be directly represented as a function approximated by a neural network, and significantly enhance the representation capacity beyond current INR-based video codecs. We evaluate our method on UVG, MCL JVC, and Bunny datasets, achieving competitive results for video compression with INRs. This achievement narrows the gap to autoencoder-based video coding, marking a significant stride in INR-based video compression research.

NERV++: An Enhanced Implicit Neural Video Representation

TL;DR

This work introduces NeRV++, an enhanced implicit neural video representation that strengthens the decoder with separable conv2d residual blocks and a bilinear interpolation skip to improve rate-distortion for INR-based video compression. The framework supports time-continuous video representation via a time-conditioned MLP and SCRB-based decoding, followed by model pruning and 8-bit quantization to achieve compact, efficient encodings. Empirical results on UVG, MCL_JVC, and Bunny show competitive RD performance, with notable gains over prior INR methods, while highlighting tradeoffs in decoding latency. The study demonstrates that INR-based video codecs can approach autoencoder-based performance, while outlining avenues for further efficiency improvements in entropy modeling and hardware-aware deployment.

Abstract

Neural fields, also known as implicit neural representations (INRs), have shown a remarkable capability of representing, generating, and manipulating various data types, allowing for continuous data reconstruction at a low memory footprint. Though promising, INRs applied to video compression still need to improve their rate-distortion performance by a large margin, and require a huge number of parameters and long training iterations to capture high-frequency details, limiting their wider applicability. Resolving this problem remains a quite challenging task, which would make INRs more accessible in compression tasks. We take a step towards resolving these shortcomings by introducing neural representations for videos NeRV++, an enhanced implicit neural video representation, as more straightforward yet effective enhancement over the original NeRV decoder architecture, featuring separable conv2d residual blocks (SCRBs) that sandwiches the upsampling block (UB), and a bilinear interpolation skip layer for improved feature representation. NeRV++ allows videos to be directly represented as a function approximated by a neural network, and significantly enhance the representation capacity beyond current INR-based video codecs. We evaluate our method on UVG, MCL JVC, and Bunny datasets, achieving competitive results for video compression with INRs. This achievement narrows the gap to autoencoder-based video coding, marking a significant stride in INR-based video compression research.
Paper Structure (9 sections, 1 equation, 3 figures, 3 tables)

This paper contains 9 sections, 1 equation, 3 figures, 3 tables.

Figures (3)

  • Figure 1: High-level diagram of implicit neural representation for video compression.
  • Figure 2: Overall nerv++ framework. We illustrate the video compression diagram of our nerv++. PE for positional encoding, mlp for multilayer perceptron, srcb stands for the separable conv2d residual block, and UB for upsampling block.
  • Figure 3: Visualization of the reconstructed frame number 116 from the Bunny dataset.