Table of Contents
Fetching ...

Releasing the Parameter Latency of Neural Representation for High-Efficiency Video Compression

Gai Zhang, Xinfeng Zhang, Lv Tang, Yue Li, Kai Zhang, Li Zhang

TL;DR

The paper tackles INR-based video compression and questions whether INR network parameters are fully leveraging their information storage potential. It introduces a parameter reuse framework to deepen and widen the network without increasing encoded parameters. Experiments on HEVC Class B and MCL-JCV show significant rate-distortion gains over prior INR methods and competitive performance with traditional codecs, especially on HEVC Class B. The results demonstrate a practical path to unlock more information storage in INR representations and guide future network-design choices for high-efficiency video coding.

Abstract

For decades, video compression technology has been a prominent research area. Traditional hybrid video compression framework and end-to-end frameworks continue to explore various intra- and inter-frame reference and prediction strategies based on discrete transforms and deep learning techniques. However, the emerging implicit neural representation (INR) technique models entire videos as basic units, automatically capturing intra-frame and inter-frame correlations and obtaining promising performance. INR uses a compact neural network to store video information in network parameters, effectively eliminating spatial and temporal redundancy in the original video. However, in this paper, our exploration and verification reveal that current INR video compression methods do not fully exploit their potential to preserve information. We investigate the potential of enhancing network parameter storage through parameter reuse. By deepening the network, we designed a feasible INR parameter reuse scheme to further improve compression performance. Extensive experimental results show that our method significantly enhances the rate-distortion performance of INR video compression.

Releasing the Parameter Latency of Neural Representation for High-Efficiency Video Compression

TL;DR

The paper tackles INR-based video compression and questions whether INR network parameters are fully leveraging their information storage potential. It introduces a parameter reuse framework to deepen and widen the network without increasing encoded parameters. Experiments on HEVC Class B and MCL-JCV show significant rate-distortion gains over prior INR methods and competitive performance with traditional codecs, especially on HEVC Class B. The results demonstrate a practical path to unlock more information storage in INR representations and guide future network-design choices for high-efficiency video coding.

Abstract

For decades, video compression technology has been a prominent research area. Traditional hybrid video compression framework and end-to-end frameworks continue to explore various intra- and inter-frame reference and prediction strategies based on discrete transforms and deep learning techniques. However, the emerging implicit neural representation (INR) technique models entire videos as basic units, automatically capturing intra-frame and inter-frame correlations and obtaining promising performance. INR uses a compact neural network to store video information in network parameters, effectively eliminating spatial and temporal redundancy in the original video. However, in this paper, our exploration and verification reveal that current INR video compression methods do not fully exploit their potential to preserve information. We investigate the potential of enhancing network parameter storage through parameter reuse. By deepening the network, we designed a feasible INR parameter reuse scheme to further improve compression performance. Extensive experimental results show that our method significantly enhances the rate-distortion performance of INR video compression.
Paper Structure (26 sections, 4 equations, 5 figures, 2 tables)

This paper contains 26 sections, 4 equations, 5 figures, 2 tables.

Figures (5)

  • Figure 1: The encoding and decoding pipeline of our method.
  • Figure 2: (a) shows the network structure of the HiNeRV network. The input is the target video patch coordinate (i,j,t). According to the coordinate, the input grid which is an autoregressive embedding is computed. After a linear layer, the input grid will be processed by n HiNeRV blocks. Finally, after a convlutional layer, the output video patch is generated. (b) gives the details of HiNeRV block. The block computes the local grid with the input coordinate (i,j,t) and maps it to the target channels by a linear layer. Then the local grid will add the upsampled input feature Xn-1 and be processed by Dn ConvNeXt blocks to get the output feature Xn. (c) shows the deepening HiNeRV block by parameter reuse. In the HiNeRV block, each ConvNeXt block will be stacked m times to reuse their parameters and enhance the network expressive ability. (d) shows the widening HiNeRV block with widening ConvNeXt block. In the widening ConvNeXt block, the first linear layer concats its weight to expand the output channel. The second linear layer concats its weight to expand the input channel.
  • Figure 3: The rate-distortion performance of our method and other video compression methods on HEVC Class B and MCL-JCV datasets.
  • Figure 4: The visual frames from HEVC Class B.
  • Figure 5: The ablation study on HEVC Class B of our method.