Table of Contents
Fetching ...

A Tri-Dynamic Preprocessing Framework for UGC Video Compression

Fei Zhao, Mengxi Guo, Shijie Zhao, Junlin Li, Li Zhang, Xiaodong Xie

TL;DR

UGC video content exhibits high spatio-temporal diversity, challenging traditional preprocessing for compression. The proposed Tri-Dynamic Preprocessing framework combines pre-analysis-driven Dynamic Processing Intensity, Dynamic Quantization Level, and Dynamic Lambda Trade-off to guide training of a deep preprocessing network, while testing uses only DPI. The approach achieves substantial RD gains on YouTube-UGC across perceptual metrics (e.g., 7.14% BDBR for VMAF_NEG and 12.03% for VMAF) and reduces bad-case occurrences, with consistent improvements when evaluated against standard codecs. Ablation studies confirm that the joint contribution of all three components outperforms any single one, and analysis ties the quantization adaptation to spatio-temporal complexity. This framework provides a scalable preprocessing-based strategy for improving UGC video compression.

Abstract

In recent years, user generated content (UGC) has become the dominant force in internet traffic. However, UGC videos exhibit a higher degree of variability and diverse characteristics compared to traditional encoding test videos. This variance challenges the effectiveness of data-driven machine learning algorithms for optimizing encoding in the broader context of UGC scenarios. To address this issue, we propose a Tri-Dynamic Preprocessing framework for UGC. Firstly, we employ an adaptive factor to regulate preprocessing intensity. Secondly, an adaptive quantization level is employed to fine-tune the codec simulator. Thirdly, we utilize an adaptive lambda tradeoff to adjust the rate-distortion loss function. Experimental results on large-scale test sets demonstrate that our method attains exceptional performance.

A Tri-Dynamic Preprocessing Framework for UGC Video Compression

TL;DR

UGC video content exhibits high spatio-temporal diversity, challenging traditional preprocessing for compression. The proposed Tri-Dynamic Preprocessing framework combines pre-analysis-driven Dynamic Processing Intensity, Dynamic Quantization Level, and Dynamic Lambda Trade-off to guide training of a deep preprocessing network, while testing uses only DPI. The approach achieves substantial RD gains on YouTube-UGC across perceptual metrics (e.g., 7.14% BDBR for VMAF_NEG and 12.03% for VMAF) and reduces bad-case occurrences, with consistent improvements when evaluated against standard codecs. Ablation studies confirm that the joint contribution of all three components outperforms any single one, and analysis ties the quantization adaptation to spatio-temporal complexity. This framework provides a scalable preprocessing-based strategy for improving UGC video compression.

Abstract

In recent years, user generated content (UGC) has become the dominant force in internet traffic. However, UGC videos exhibit a higher degree of variability and diverse characteristics compared to traditional encoding test videos. This variance challenges the effectiveness of data-driven machine learning algorithms for optimizing encoding in the broader context of UGC scenarios. To address this issue, we propose a Tri-Dynamic Preprocessing framework for UGC. Firstly, we employ an adaptive factor to regulate preprocessing intensity. Secondly, an adaptive quantization level is employed to fine-tune the codec simulator. Thirdly, we utilize an adaptive lambda tradeoff to adjust the rate-distortion loss function. Experimental results on large-scale test sets demonstrate that our method attains exceptional performance.

Paper Structure

This paper contains 11 sections, 5 equations, 5 figures, 3 tables.

Figures (5)

  • Figure 1: (a).Distribution of VMAF_NEG after encoding different datasets under 1500kbps (b).The BDBR performance distribution of baseline method compared with our method.
  • Figure 2: The overall pipeline of the proposed method. Note that we utilize the full TDP (DPI, DQL and DlamT) for training but only DPI while testing. We use a codec simulator for training and real codec for testing.
  • Figure 3: The structure of proposed DPN which contains four basic residual blocks. By employing the residual connection architecture, we can dynamically fine-tune the processing intensity through a multiplicative factor.
  • Figure 4: (a)Illustration of RD curves on VMAF_NEG with VVC for Youtube-UGC;(b)Bad case rate for different methods with VVC, under Youtube-UGC dataset.
  • Figure 5: BD-Rate performance distribution heat-map for different spatio-temporal complexity and $f_q$.