A Neural-network Enhanced Video Coding Framework beyond ECM

Yanchen Zhao; Wenxuan He; Chuanmin Jia; Qizhe Wang; Junru Li; Yue Li; Chaoyi Lin; Kai Zhang; Li Zhang; Siwei Ma

A Neural-network Enhanced Video Coding Framework beyond ECM

Yanchen Zhao, Wenxuan He, Chuanmin Jia, Qizhe Wang, Junru Li, Yue Li, Chaoyi Lin, Kai Zhang, Li Zhang, Siwei Ma

TL;DR

The paper integrates three learning-augmented tools into ECM-10.0: Unsymmetric Quaternary Tree (UQT) partitioning for finer block representation, a CNN-based in-loop filter with adaptive inputs, and Block Importance Mapping (BIM) to adapt per-CTU QP deltas. On the JVET dataset with Random Access, the combined approach yields BD-rate reductions on the Y/U/V components (approximately $-6.26 ext{%}$, $-13.33 ext{%}$, and $-12.33 ext{%}$, respectively) relative to ECM-10.0, with CNN-based filtering providing the majority of gains and BIM supplying additional improvements. CLIC 2024 validation shows competitive PSNR performance under resource constraints when using UQT and BIM, achieving $25.889\,\mathrm{dB}$ at $0.05$ Mbps in the valid set, while CNN-based filtering was omitted due to computational limits. Overall, the study demonstrates that a hybrid framework combining deep learning with traditional coding tools can substantially enhance compression efficiency and guide future convergence of learning-based and conventional video coding methods.

Abstract

In this paper, a hybrid video compression framework is proposed that serves as a demonstrative showcase of deep learning-based approaches extending beyond the confines of traditional coding methodologies. The proposed hybrid framework is founded upon the Enhanced Compression Model (ECM), which is a further enhancement of the Versatile Video Coding (VVC) standard. We have augmented the latest ECM reference software with well-designed coding techniques, including block partitioning, deep learning-based loop filter, and the activation of block importance mapping (BIM) which was integrated but previously inactive within ECM, further enhancing coding performance. Compared with ECM-10.0, our method achieves 6.26, 13.33, and 12.33 BD-rate savings for the Y, U, and V components under random access (RA) configuration, respectively.

A Neural-network Enhanced Video Coding Framework beyond ECM

TL;DR

, and

, respectively) relative to ECM-10.0, with CNN-based filtering providing the majority of gains and BIM supplying additional improvements. CLIC 2024 validation shows competitive PSNR performance under resource constraints when using UQT and BIM, achieving

Mbps in the valid set, while CNN-based filtering was omitted due to computational limits. Overall, the study demonstrates that a hybrid framework combining deep learning with traditional coding tools can substantially enhance compression efficiency and guide future convergence of learning-based and conventional video coding methods.

Abstract

Paper Structure (5 sections, 2 figures, 4 tables)

This paper contains 5 sections, 2 figures, 4 tables.

Unsymmetric Quaternary Tree
CNN-based In-Loop Filter
Block Importance Mapping
Performance with JVET Data Set
Performance with CLIC Data Set

Figures (2)

Figure 1: Illustration of the UQT partitioning.
Figure 2: Illustration of the network structure regarding the proposed CNN-based in-loop filter.

A Neural-network Enhanced Video Coding Framework beyond ECM

TL;DR

Abstract

A Neural-network Enhanced Video Coding Framework beyond ECM

Authors

TL;DR

Abstract

Table of Contents

Figures (2)