A Tri-Dynamic Preprocessing Framework for UGC Video Compression
Fei Zhao, Mengxi Guo, Shijie Zhao, Junlin Li, Li Zhang, Xiaodong Xie
TL;DR
UGC video content exhibits high spatio-temporal diversity, challenging traditional preprocessing for compression. The proposed Tri-Dynamic Preprocessing framework combines pre-analysis-driven Dynamic Processing Intensity, Dynamic Quantization Level, and Dynamic Lambda Trade-off to guide training of a deep preprocessing network, while testing uses only DPI. The approach achieves substantial RD gains on YouTube-UGC across perceptual metrics (e.g., 7.14% BDBR for VMAF_NEG and 12.03% for VMAF) and reduces bad-case occurrences, with consistent improvements when evaluated against standard codecs. Ablation studies confirm that the joint contribution of all three components outperforms any single one, and analysis ties the quantization adaptation to spatio-temporal complexity. This framework provides a scalable preprocessing-based strategy for improving UGC video compression.
Abstract
In recent years, user generated content (UGC) has become the dominant force in internet traffic. However, UGC videos exhibit a higher degree of variability and diverse characteristics compared to traditional encoding test videos. This variance challenges the effectiveness of data-driven machine learning algorithms for optimizing encoding in the broader context of UGC scenarios. To address this issue, we propose a Tri-Dynamic Preprocessing framework for UGC. Firstly, we employ an adaptive factor to regulate preprocessing intensity. Secondly, an adaptive quantization level is employed to fine-tune the codec simulator. Thirdly, we utilize an adaptive lambda tradeoff to adjust the rate-distortion loss function. Experimental results on large-scale test sets demonstrate that our method attains exceptional performance.
