Table of Contents
Fetching ...

Residual Learning and Filtering Networks for End-to-End Lossless Video Compression

Md baharul Islam, Afsana Ahsan Jeny

TL;DR

The paper tackles the challenge of achieving high-quality end-to-end video compression by addressing inaccurate motion estimation and compensation in traditional pipelines. It introduces an integrated architecture that combines motion estimation, motion-vector and residual compression with filtering networks, deep nonlinear transforms (including PReLU), and an online frame-buffer to refine references. Key contributions include a residual skip-connected MV compression network, MVF and RF modules to suppress artifacts, and a unified rate–distortion loss with a hyperprior entropy model for accurate bitrate estimation. Across multiple datasets (HEVC sequences B/C/D, UVG, MCL-JCV, VTL), the method delivers competitive MS-SSIM and PSNR with substantial BDBR-based bitrate savings, demonstrating practical improvements in visual quality at reduced bitrates. The work highlights the potential of end-to-end optimization for video coding and points to future enhancements via a dedicated entropy model to further boost efficiency.

Abstract

Existing learning-based video compression methods still face challenges related to inaccurate motion estimates and inadequate motion compensation structures. These issues result in compression errors and a suboptimal rate-distortion trade-off. To address these challenges, this work presents an end-to-end video compression method that incorporates several key operations. Specifically, we propose an autoencoder-type network with a residual skip connection to efficiently compress motion information. Additionally, we design motion vector and residual frame filtering networks to mitigate compression errors in the video compression system. To improve the effectiveness of the motion compensation network, we utilize powerful nonlinear transforms, such as the Parametric Rectified Linear Unit (PReLU), to delve deeper into the motion compensation architecture. Furthermore, a buffer is introduced to fine-tune the previous reference frames, thereby enhancing the reconstructed frame quality. These modules are combined with a carefully designed loss function that assesses the trade-off and enhances the overall video quality of the decoded output. Experimental results showcase the competitive performance of our method on various datasets, including HEVC (sequences B, C, and D), UVG, VTL, and MCL-JCV. The proposed approach tackles the challenges of accurate motion estimation and motion compensation in video compression, and the results highlight its competitive performance compared to existing methods.

Residual Learning and Filtering Networks for End-to-End Lossless Video Compression

TL;DR

The paper tackles the challenge of achieving high-quality end-to-end video compression by addressing inaccurate motion estimation and compensation in traditional pipelines. It introduces an integrated architecture that combines motion estimation, motion-vector and residual compression with filtering networks, deep nonlinear transforms (including PReLU), and an online frame-buffer to refine references. Key contributions include a residual skip-connected MV compression network, MVF and RF modules to suppress artifacts, and a unified rate–distortion loss with a hyperprior entropy model for accurate bitrate estimation. Across multiple datasets (HEVC sequences B/C/D, UVG, MCL-JCV, VTL), the method delivers competitive MS-SSIM and PSNR with substantial BDBR-based bitrate savings, demonstrating practical improvements in visual quality at reduced bitrates. The work highlights the potential of end-to-end optimization for video coding and points to future enhancements via a dedicated entropy model to further boost efficiency.

Abstract

Existing learning-based video compression methods still face challenges related to inaccurate motion estimates and inadequate motion compensation structures. These issues result in compression errors and a suboptimal rate-distortion trade-off. To address these challenges, this work presents an end-to-end video compression method that incorporates several key operations. Specifically, we propose an autoencoder-type network with a residual skip connection to efficiently compress motion information. Additionally, we design motion vector and residual frame filtering networks to mitigate compression errors in the video compression system. To improve the effectiveness of the motion compensation network, we utilize powerful nonlinear transforms, such as the Parametric Rectified Linear Unit (PReLU), to delve deeper into the motion compensation architecture. Furthermore, a buffer is introduced to fine-tune the previous reference frames, thereby enhancing the reconstructed frame quality. These modules are combined with a carefully designed loss function that assesses the trade-off and enhances the overall video quality of the decoded output. Experimental results showcase the competitive performance of our method on various datasets, including HEVC (sequences B, C, and D), UVG, VTL, and MCL-JCV. The proposed approach tackles the challenges of accurate motion estimation and motion compensation in video compression, and the results highlight its competitive performance compared to existing methods.

Paper Structure

This paper contains 19 sections, 4 equations, 5 figures, 2 tables.

Figures (5)

  • Figure 1: Proposed architecture. In the ME module, we pass the current frame $f_{t}$ and the previous reference frame $\hat{f}_{t-1}$ to the optical flow network to get $o_{t}$. Then it sends to the MVC network to compress the raw optical values. After that $\bar{o}_{t}$ has been fed into the proposed MVF to eliminate compression artifacts and received $\hat{o}_{t}$. Next, $\hat{o}_{t}$ is passed through the proposed MCDR network to obtain $\bar{f}_{t}$ and then transmitted to RCF module with $r_{t}$. From that $\bar{r}_{t}$ is acquired from residual compression (RC) network and fed into our proposed method RF to receive $\hat{r}_{t}$ since it has artifacts due to the quantization of the residual encoder-decoder network. Finally, $\hat{r}_{t}$ and $\bar{f}_{t}$ are added to create the final reconstructed frame, $\hat{f}_{t}$. DFB is utilized as an online buffer to fine-tune the reference frames for making the clear reconstructed frame.
  • Figure 2: The MS-SSIM comparison of our method with the state-of-the-art learning-based methods, e.g., DVC lu2019dvc, Djelouah et al. djelouah2019neural, Agustsson et al. agustsson2020scale, HU et al. hu2020improving, LU et al. lu2020content, FVC hu2021fvc, and conventional methods, H.264/H.265 wiegand2003overviewsullivan2012overview on HEVC Test Sequences (Class B, C, and D) sullivan2012overview, UVG mercat2020uvg, MCL-JCV wang2016mcl, and VTL hu2021fvc datasets.
  • Figure 3: The PSNR comparison of our method with the state-of-the-art learning-based methods, e.g., DVC lu2019dvc, Djelouah et al. djelouah2019neural, Agustsson et al. agustsson2020scale, HU et al. hu2020improving, LU et al. lu2020content, FVC hu2021fvc, and conventional methods, H.264/H.265 wiegand2003overviewsullivan2012overview on HEVC Test Sequences (Class B, C, and D) sullivan2012overview, UVG mercat2020uvg, MCL-JCV wang2016mcl, and VTL hu2021fvc datasets.
  • Figure 4: Left: Qualitative comparison between the original and our reconstruction frames on the UVG (a), HEVC (b), MCL-JCV (c), and VTL (d) dataset. Right: The predicted video frames without, and with considering our MVF module. Original frames from HEVC and UVG dataset (a, b), predicted frames without (c, d), and with (e, f) considering MVF module in our method.
  • Figure 5: Ablation study of different modules on the HEVC Class B dataset sullivan2012overview. The blue dashed line represents our result. The dotted lines represent the results by removing the motion vector compression network (MVC(N)), motion information network, motion compensation network (MCDR(N)), residual filtering (RF(N)) network, motion vector filtering (MVF(N)) network, the motion estimation (ME) module, and a decoded frame buffer (DFB(N)).