Benchmarking Conventional and Learned Video Codecs with a Low-Delay Configuration
Siyue Teng, Yuxuan Jiang, Ge Gao, Fan Zhang, Thomas Davis, Zoe Liu, David Bull
TL;DR
This study benchmarks six codecs—VVC VTM, JVET ECM, AV1 libaom, AVM, and neural DCVC-DC/DCVC-FM—under a strictly low-delay configuration using MPEG JVET/AOM test conditions in YCbCr 4:2:0. It leverages BD-rate based on PSNR and VMAF to compare performance while also quantifying encoding/decoding complexity on a standard hardware platform. Results indicate that ECM-LDB provides the strongest PSNR BD-rate gains (around 18% over VTM-LDB, 18.8% over DCVC-DC, and 16% over AVM), with neural codecs approaching ECM in VMAF but requiring higher computational resources. Across resolutions, higher-quality gains emerge for some codecs, yet learned methods exhibit content-dependent variability, particularly on sequences with large background motion, underscoring the need for low-latency, low-complexity improvements for real-time streaming scenarios.
Abstract
Recent advances in video compression have seen significant coding performance improvements with the development of new standards and learning-based video codecs. However, most of these works focus on application scenarios that allow a certain amount of system delay (e.g., Random Access mode in MPEG codecs), which is not always acceptable for live delivery. This paper conducts a comparative study of state-of-the-art conventional and learned video coding methods based on a low delay configuration. Specifically, this study includes two MPEG standard codecs (H.266/VVC VTM and JVET ECM), two AOM codecs (AV1 libaom and AVM), and two recent neural video coding models (DCVC-DC and DCVC-FM). To allow a fair and meaningful comparison, the evaluation was performed on test sequences defined in the AOM and MPEG common test conditions in the YCbCr 4:2:0 color space. The evaluation results show that the JVET ECM codecs offer the best overall coding performance among all codecs tested, with a 16.1% (based on PSNR) average BD-rate saving over AOM AVM, and 11.0% over DCVC-FM. We also observed inconsistent performance with the learned video codecs, DCVC-DC and DCVC-FM, for test content with large background motions.
