Residual Learning and Filtering Networks for End-to-End Lossless Video Compression
Md baharul Islam, Afsana Ahsan Jeny
TL;DR
The paper tackles the challenge of achieving high-quality end-to-end video compression by addressing inaccurate motion estimation and compensation in traditional pipelines. It introduces an integrated architecture that combines motion estimation, motion-vector and residual compression with filtering networks, deep nonlinear transforms (including PReLU), and an online frame-buffer to refine references. Key contributions include a residual skip-connected MV compression network, MVF and RF modules to suppress artifacts, and a unified rate–distortion loss with a hyperprior entropy model for accurate bitrate estimation. Across multiple datasets (HEVC sequences B/C/D, UVG, MCL-JCV, VTL), the method delivers competitive MS-SSIM and PSNR with substantial BDBR-based bitrate savings, demonstrating practical improvements in visual quality at reduced bitrates. The work highlights the potential of end-to-end optimization for video coding and points to future enhancements via a dedicated entropy model to further boost efficiency.
Abstract
Existing learning-based video compression methods still face challenges related to inaccurate motion estimates and inadequate motion compensation structures. These issues result in compression errors and a suboptimal rate-distortion trade-off. To address these challenges, this work presents an end-to-end video compression method that incorporates several key operations. Specifically, we propose an autoencoder-type network with a residual skip connection to efficiently compress motion information. Additionally, we design motion vector and residual frame filtering networks to mitigate compression errors in the video compression system. To improve the effectiveness of the motion compensation network, we utilize powerful nonlinear transforms, such as the Parametric Rectified Linear Unit (PReLU), to delve deeper into the motion compensation architecture. Furthermore, a buffer is introduced to fine-tune the previous reference frames, thereby enhancing the reconstructed frame quality. These modules are combined with a carefully designed loss function that assesses the trade-off and enhances the overall video quality of the decoded output. Experimental results showcase the competitive performance of our method on various datasets, including HEVC (sequences B, C, and D), UVG, VTL, and MCL-JCV. The proposed approach tackles the challenges of accurate motion estimation and motion compensation in video compression, and the results highlight its competitive performance compared to existing methods.
