Table of Contents
Fetching ...

A Neural-network Enhanced Video Coding Framework beyond ECM

Yanchen Zhao, Wenxuan He, Chuanmin Jia, Qizhe Wang, Junru Li, Yue Li, Chaoyi Lin, Kai Zhang, Li Zhang, Siwei Ma

TL;DR

The paper integrates three learning-augmented tools into ECM-10.0: Unsymmetric Quaternary Tree (UQT) partitioning for finer block representation, a CNN-based in-loop filter with adaptive inputs, and Block Importance Mapping (BIM) to adapt per-CTU QP deltas. On the JVET dataset with Random Access, the combined approach yields BD-rate reductions on the Y/U/V components (approximately $-6.26 ext{%}$, $-13.33 ext{%}$, and $-12.33 ext{%}$, respectively) relative to ECM-10.0, with CNN-based filtering providing the majority of gains and BIM supplying additional improvements. CLIC 2024 validation shows competitive PSNR performance under resource constraints when using UQT and BIM, achieving $25.889\,\mathrm{dB}$ at $0.05$ Mbps in the valid set, while CNN-based filtering was omitted due to computational limits. Overall, the study demonstrates that a hybrid framework combining deep learning with traditional coding tools can substantially enhance compression efficiency and guide future convergence of learning-based and conventional video coding methods.

Abstract

In this paper, a hybrid video compression framework is proposed that serves as a demonstrative showcase of deep learning-based approaches extending beyond the confines of traditional coding methodologies. The proposed hybrid framework is founded upon the Enhanced Compression Model (ECM), which is a further enhancement of the Versatile Video Coding (VVC) standard. We have augmented the latest ECM reference software with well-designed coding techniques, including block partitioning, deep learning-based loop filter, and the activation of block importance mapping (BIM) which was integrated but previously inactive within ECM, further enhancing coding performance. Compared with ECM-10.0, our method achieves 6.26, 13.33, and 12.33 BD-rate savings for the Y, U, and V components under random access (RA) configuration, respectively.

A Neural-network Enhanced Video Coding Framework beyond ECM

TL;DR

The paper integrates three learning-augmented tools into ECM-10.0: Unsymmetric Quaternary Tree (UQT) partitioning for finer block representation, a CNN-based in-loop filter with adaptive inputs, and Block Importance Mapping (BIM) to adapt per-CTU QP deltas. On the JVET dataset with Random Access, the combined approach yields BD-rate reductions on the Y/U/V components (approximately , , and , respectively) relative to ECM-10.0, with CNN-based filtering providing the majority of gains and BIM supplying additional improvements. CLIC 2024 validation shows competitive PSNR performance under resource constraints when using UQT and BIM, achieving at Mbps in the valid set, while CNN-based filtering was omitted due to computational limits. Overall, the study demonstrates that a hybrid framework combining deep learning with traditional coding tools can substantially enhance compression efficiency and guide future convergence of learning-based and conventional video coding methods.

Abstract

In this paper, a hybrid video compression framework is proposed that serves as a demonstrative showcase of deep learning-based approaches extending beyond the confines of traditional coding methodologies. The proposed hybrid framework is founded upon the Enhanced Compression Model (ECM), which is a further enhancement of the Versatile Video Coding (VVC) standard. We have augmented the latest ECM reference software with well-designed coding techniques, including block partitioning, deep learning-based loop filter, and the activation of block importance mapping (BIM) which was integrated but previously inactive within ECM, further enhancing coding performance. Compared with ECM-10.0, our method achieves 6.26, 13.33, and 12.33 BD-rate savings for the Y, U, and V components under random access (RA) configuration, respectively.
Paper Structure (5 sections, 2 figures, 4 tables)

This paper contains 5 sections, 2 figures, 4 tables.

Figures (2)

  • Figure 1: Illustration of the UQT partitioning.
  • Figure 2: Illustration of the network structure regarding the proposed CNN-based in-loop filter.