UCVC: A Unified Contextual Video Compression Framework with Joint P-frame and B-frame Coding
Jiayu Yang, Wei Jiang, Yongqi Zhai, Chunhui Yang, Ronggang Wang
TL;DR
The paper tackles the challenge of flexible video compression by enabling unified P-frame and B-frame coding within a single learned framework. It introduces UCVC, which uses two neighboring decoded frames as references and jointly trains for both P- and B-frame scenarios, achieving comparable efficiency to frame-type-specific methods. The model employs a conditional coding pipeline with motion estimation, temporal context mining, and a mean-scale hyperprior, optimized via a rate–distortion objective $Loss = R + \lambda D$ and a strategic frame-type allocation across GoPs. Empirical results on CLIC validation and benchmark datasets show that frame-type selection per sequence yields BD-rate savings and competitive performance against traditional codecs and state-of-the-art learned methods, highlighting the practical value of adaptive frame-type strategies in learned video compression.
Abstract
This paper presents a learned video compression method in response to video compression track of the 6th Challenge on Learned Image Compression (CLIC), at DCC 2024.Specifically, we propose a unified contextual video compression framework (UCVC) for joint P-frame and B-frame coding. Each non-intra frame refers to two neighboring decoded frames, which can be either both from the past for P-frame compression, or one from the past and one from the future for B-frame compression. In training stage, the model parameters are jointly optimized with both P-frames and B-frames. Benefiting from the designs, the framework can support both P-frame and B-frame coding and achieve comparable compression efficiency with that specifically designed for P-frame or B-frame.As for challenge submission, we report the optimal compression efficiency by selecting appropriate frame types for each test sequence. Our team name is PKUSZ-LVC.
