Table of Contents
Fetching ...

CANeRV: Content Adaptive Neural Representation for Video Compression

Lv Tang, Jun Zhu, Xinfeng Zhang, Li Zhang, Siwei Ma, Qingming Huang

TL;DR

This work proposes Content Adaptive Neural Representation for Video Compression (CANeRV), an innovative INR-based video compression network that adaptively conducts structure optimisation based on the specific content of each video sequence, and proposes a structure level hierarchical structural adaptation (HSA).

Abstract

Recent advances in video compression introduce implicit neural representation (INR) based methods, which effectively capture global dependencies and characteristics of entire video sequences. Unlike traditional and deep learning based approaches, INR-based methods optimize network parameters from a global perspective, resulting in superior compression potential. However, most current INR methods utilize a fixed and uniform network architecture across all frames, limiting their adaptability to dynamic variations within and between video sequences. This often leads to suboptimal compression outcomes as these methods struggle to capture the distinct nuances and transitions in video content. To overcome these challenges, we propose Content Adaptive Neural Representation for Video Compression (CANeRV), an innovative INR-based video compression network that adaptively conducts structure optimisation based on the specific content of each video sequence. To better capture dynamic information across video sequences, we propose a dynamic sequence-level adjustment (DSA). Furthermore, to enhance the capture of dynamics between frames within a sequence, we implement a dynamic frame-level adjustment (DFA). {Finally, to effectively capture spatial structural information within video frames, thereby enhancing the detail restoration capabilities of CANeRV, we devise a structure level hierarchical structural adaptation (HSA).} Experimental results demonstrate that CANeRV can outperform both H.266/VVC and state-of-the-art INR-based video compression techniques across diverse video datasets.

CANeRV: Content Adaptive Neural Representation for Video Compression

TL;DR

This work proposes Content Adaptive Neural Representation for Video Compression (CANeRV), an innovative INR-based video compression network that adaptively conducts structure optimisation based on the specific content of each video sequence, and proposes a structure level hierarchical structural adaptation (HSA).

Abstract

Recent advances in video compression introduce implicit neural representation (INR) based methods, which effectively capture global dependencies and characteristics of entire video sequences. Unlike traditional and deep learning based approaches, INR-based methods optimize network parameters from a global perspective, resulting in superior compression potential. However, most current INR methods utilize a fixed and uniform network architecture across all frames, limiting their adaptability to dynamic variations within and between video sequences. This often leads to suboptimal compression outcomes as these methods struggle to capture the distinct nuances and transitions in video content. To overcome these challenges, we propose Content Adaptive Neural Representation for Video Compression (CANeRV), an innovative INR-based video compression network that adaptively conducts structure optimisation based on the specific content of each video sequence. To better capture dynamic information across video sequences, we propose a dynamic sequence-level adjustment (DSA). Furthermore, to enhance the capture of dynamics between frames within a sequence, we implement a dynamic frame-level adjustment (DFA). {Finally, to effectively capture spatial structural information within video frames, thereby enhancing the detail restoration capabilities of CANeRV, we devise a structure level hierarchical structural adaptation (HSA).} Experimental results demonstrate that CANeRV can outperform both H.266/VVC and state-of-the-art INR-based video compression techniques across diverse video datasets.

Paper Structure

This paper contains 26 sections, 8 equations, 11 figures, 6 tables, 1 algorithm.

Figures (11)

  • Figure 1: The architecture of existing frame-by-frame style and INR-Based methods. Frame-by-frame style methods may typically contain hybrid video coding methods and deep learning based video coding methods.
  • Figure 2: (a) shows that existing INR-based video compression methods use a uniform and fixed architecture configuration to process different videos. (b) is our proposed CANeRV that adaptively optimises the structure of the INR network. (c) is the compression performance of CANeRV.
  • Figure 3: (a) shows the typical architecture of existing video INR network. (b) is the architecture of our proposed novel CANeRV. For DSA, we briefly hypothesise four architecture adjustment configurations in this figure, with each adjustment yielding the RD performance of the current network architecture. Finally, we select the network architecture that offers the best RD performance.
  • Figure 4: Visual comparison between CANeRV using DFA and not using DFA. w/ means "with" operation and w/o means "without" operation. For sequences with complex motion, such as the Basketball sequence, DFA effectively aids CANeRV in capturing the unique characteristics of different frames, thereby reconstructing higher-quality video frames.
  • Figure 5: The architecture of our proposed HSA mechanism. In the HSA mechanism, the parameters of the $3\times3$ convolution operations need to be compressed, while the parameters involved in the two $1\times1$ convolution operations do not require to be compressed.
  • ...and 6 more figures