A Neural-network Enhanced Video Coding Framework beyond ECM
Yanchen Zhao, Wenxuan He, Chuanmin Jia, Qizhe Wang, Junru Li, Yue Li, Chaoyi Lin, Kai Zhang, Li Zhang, Siwei Ma
TL;DR
The paper integrates three learning-augmented tools into ECM-10.0: Unsymmetric Quaternary Tree (UQT) partitioning for finer block representation, a CNN-based in-loop filter with adaptive inputs, and Block Importance Mapping (BIM) to adapt per-CTU QP deltas. On the JVET dataset with Random Access, the combined approach yields BD-rate reductions on the Y/U/V components (approximately $-6.26 ext{%}$, $-13.33 ext{%}$, and $-12.33 ext{%}$, respectively) relative to ECM-10.0, with CNN-based filtering providing the majority of gains and BIM supplying additional improvements. CLIC 2024 validation shows competitive PSNR performance under resource constraints when using UQT and BIM, achieving $25.889\,\mathrm{dB}$ at $0.05$ Mbps in the valid set, while CNN-based filtering was omitted due to computational limits. Overall, the study demonstrates that a hybrid framework combining deep learning with traditional coding tools can substantially enhance compression efficiency and guide future convergence of learning-based and conventional video coding methods.
Abstract
In this paper, a hybrid video compression framework is proposed that serves as a demonstrative showcase of deep learning-based approaches extending beyond the confines of traditional coding methodologies. The proposed hybrid framework is founded upon the Enhanced Compression Model (ECM), which is a further enhancement of the Versatile Video Coding (VVC) standard. We have augmented the latest ECM reference software with well-designed coding techniques, including block partitioning, deep learning-based loop filter, and the activation of block importance mapping (BIM) which was integrated but previously inactive within ECM, further enhancing coding performance. Compared with ECM-10.0, our method achieves 6.26, 13.33, and 12.33 BD-rate savings for the Y, U, and V components under random access (RA) configuration, respectively.
