VQ-NeRV: A Vector Quantized Neural Representation for Videos

Yunjie Xu; Xiang Feng; Feiwei Qin; Ruiquan Ge; Yong Peng; Changmiao Wang

VQ-NeRV: A Vector Quantized Neural Representation for Videos

Yunjie Xu, Xiang Feng, Feiwei Qin, Ruiquan Ge, Yong Peng, Changmiao Wang

TL;DR

This work introduces an advanced U-shaped architecture, Vector Quantized-NeRV (VQ-NeRV), which integrates a novel component--the VQ-NeRV Block, which incorporates a codebook mechanism to discretize the network's shallow residual features and inter-frame residual information effectively.

Abstract

Implicit neural representations (INR) excel in encoding videos within neural networks, showcasing promise in computer vision tasks like video compression and denoising. INR-based approaches reconstruct video frames from content-agnostic embeddings, which hampers their efficacy in video frame regression and restricts their generalization ability for video interpolation. To address these deficiencies, Hybrid Neural Representation for Videos (HNeRV) was introduced with content-adaptive embeddings. Nevertheless, HNeRV's compression ratios remain relatively low, attributable to an oversight in leveraging the network's shallow features and inter-frame residual information. In this work, we introduce an advanced U-shaped architecture, Vector Quantized-NeRV (VQ-NeRV), which integrates a novel component--the VQ-NeRV Block. This block incorporates a codebook mechanism to discretize the network's shallow residual features and inter-frame residual information effectively. This approach proves particularly advantageous in video compression, as it results in smaller size compared to quantized features. Furthermore, we introduce an original codebook optimization technique, termed shallow codebook optimization, designed to refine the utility and efficiency of the codebook. The experimental evaluations indicate that VQ-NeRV outperforms HNeRV on video regression tasks, delivering superior reconstruction quality (with an increase of 1-2 dB in Peak Signal-to-Noise Ratio (PSNR)), better bit per pixel (bpp) efficiency, and improved video inpainting outcomes.

VQ-NeRV: A Vector Quantized Neural Representation for Videos

TL;DR

Abstract

Paper Structure (19 sections, 5 equations, 6 figures, 5 tables)

This paper contains 19 sections, 5 equations, 6 figures, 5 tables.

Introduction
Related Work
Neural Representation
Video Compression
Invertible Neural Network
Pipeline
Overview of VQ-NeRV Network
VQ-NeRV Block Architecture
Shallow Codebook Optimization
Loss Function
Down Task
Experiments
Datasets and Implementation Details
Video Regression
Video Compression
...and 4 more sections

Figures (6)

Figure 1: (a) and (b) Video interpolation qualitative results on the DAVIS dataset. (c) Video inpainting qualitative results on the DAVIS dataset. (d) Video regression for hybrid and implicit neural representations with 1.5M parameters on the Bunny dataset.
Figure 2: Overview of the proposed method VQ-NeRV. The upper figure is the video encoding processing, and the lower figure is the video decoding processing. In video compression, we train a VAE model for video encoding, utilizing the decoder's inference processing for the video decoding.
Figure 3: Overview of VQ-NeRV Block Architecture.
Figure 4: (a) The selection criteria for Exponential Moving Average (EMA) updates within a batch of VQ-VAE. (b) The proposed strategy for optimizing the shallow codebook.
Figure 5: Visualization comparing VQ-NeRV with other state-of-the-art methods for several patches on the 131st frame of the Bunny Dataset at the same extremely low size of the decoder (0.35M), corresponding to 0.0182 bpp of our VQ-NeRV.
...and 1 more figures

VQ-NeRV: A Vector Quantized Neural Representation for Videos

TL;DR

Abstract

VQ-NeRV: A Vector Quantized Neural Representation for Videos

Authors

TL;DR

Abstract

Table of Contents

Figures (6)