Table of Contents
Fetching ...

Motion Free B-frame Coding for Neural Video Compression

Van Thang Nguyen

TL;DR

Experimental results show the proposed framework outperforms the SOTA deep neural video compression networks on the HEVC-class B dataset and is competitive on the UVG and MCL-JCV datasets.

Abstract

Typical deep neural video compression networks usually follow the hybrid approach of classical video coding that contains two separate modules: motion coding and residual coding. In addition, a symmetric auto-encoder is often used as a normal architecture for both motion and residual coding. In this paper, we propose a novel approach that handles the drawbacks of the two typical above-mentioned architectures, we call it kernel-based motion-free video coding. The advantages of the motion-free approach are twofold: it improves the coding efficiency of the network and significantly reduces computational complexity thanks to eliminating motion estimation, motion compensation, and motion coding which are the most time-consuming engines. In addition, the kernel-based auto-encoder alleviates blur artifacts that usually occur with the conventional symmetric autoencoder. Consequently, it improves the visual quality of the reconstructed frames. Experimental results show the proposed framework outperforms the SOTA deep neural video compression networks on the HEVC-class B dataset and is competitive on the UVG and MCL-JCV datasets. In addition, it generates high-quality reconstructed frames in comparison with conventional motion coding-based symmetric auto-encoder meanwhile its model size is much smaller than that of the motion-based networks around three to four times.

Motion Free B-frame Coding for Neural Video Compression

TL;DR

Experimental results show the proposed framework outperforms the SOTA deep neural video compression networks on the HEVC-class B dataset and is competitive on the UVG and MCL-JCV datasets.

Abstract

Typical deep neural video compression networks usually follow the hybrid approach of classical video coding that contains two separate modules: motion coding and residual coding. In addition, a symmetric auto-encoder is often used as a normal architecture for both motion and residual coding. In this paper, we propose a novel approach that handles the drawbacks of the two typical above-mentioned architectures, we call it kernel-based motion-free video coding. The advantages of the motion-free approach are twofold: it improves the coding efficiency of the network and significantly reduces computational complexity thanks to eliminating motion estimation, motion compensation, and motion coding which are the most time-consuming engines. In addition, the kernel-based auto-encoder alleviates blur artifacts that usually occur with the conventional symmetric autoencoder. Consequently, it improves the visual quality of the reconstructed frames. Experimental results show the proposed framework outperforms the SOTA deep neural video compression networks on the HEVC-class B dataset and is competitive on the UVG and MCL-JCV datasets. In addition, it generates high-quality reconstructed frames in comparison with conventional motion coding-based symmetric auto-encoder meanwhile its model size is much smaller than that of the motion-based networks around three to four times.

Paper Structure

This paper contains 16 sections, 2 equations, 6 figures, 4 tables.

Figures (6)

  • Figure 2: Kernel-based pixel synthesis from reference pictures
  • Figure 3: (a) Convolutional kernels-based motion-free AutoEncoder B-frame coding architecture. Six 1D convolutional kernels with kernel size of 31 in the last layer at the decoder side are convoluted with three reference frames, marked as I0, I2, Ii (i stands for interpolated), two formers are the previously reconstructed frames, the other is the interpolated frame. (b) The detail layers of the proposed network with M = 128, N = 96, K = 64, and KS (kernel size) = 31, /2 means down-sampling stride of 2, and x2 means up-sampling stride of 2.
  • Figure 4: A hierarchical B-frame Coding structure
  • Figure 5: Rate-distortion curves comparisons on the HEVC-class B, the UVG, and the MCL-JCV datasets
  • Figure 6: Visual comparisons with the motion-based SSF model on the HEVC-class B and the UVG datasets
  • ...and 1 more figures