KVQ: Kwai Video Quality Assessment for Short-form Videos
Yiting Lu, Xin Li, Yajing Pei, Kun Yuan, Qizhi Xie, Yunpeng Qu, Ming Sun, Chao Zhou, Zhibo Chen
TL;DR
KVQ introduces the first large-scale kaleidoscope short-form UGC VQA database (KVQ) and a content-distortion aware evaluator (KSVQE). By integrating a CLIP-based quality-aware Region Selection (QRS), content-adaptive Modulation (CaM), and a distortion-aware Modulation (DaM) guided by CONTRIQUE, KSVQE achieves state-of-the-art performance on KVQ and generalizes well to other UGC-VQA datasets. The KVQ dataset captures realistic short-form creation modes and processing workflows, enabling fine-grained absolute MOS and indistinguishable-pair rankings. This work advances practical VQA for short-form platforms and provides a framework for future content-distortion joint understanding in S-UGC videos.
Abstract
Short-form UGC video platforms, like Kwai and TikTok, have been an emerging and irreplaceable mainstream media form, thriving on user-friendly engagement, and kaleidoscope creation, etc. However, the advancing content-generation modes, e.g., special effects, and sophisticated processing workflows, e.g., de-artifacts, have introduced significant challenges to recent UGC video quality assessment: (i) the ambiguous contents hinder the identification of quality-determined regions. (ii) the diverse and complicated hybrid distortions are hard to distinguish. To tackle the above challenges and assist in the development of short-form videos, we establish the first large-scale Kaleidoscope short Video database for Quality assessment, termed KVQ, which comprises 600 user-uploaded short videos and 3600 processed videos through the diverse practical processing workflows, including pre-processing, transcoding, and enhancement. Among them, the absolute quality score of each video and partial ranking score among indistinguishable samples are provided by a team of professional researchers specializing in image processing. Based on this database, we propose the first short-form video quality evaluator, i.e., KSVQE, which enables the quality evaluator to identify the quality-determined semantics with the content understanding of large vision language models (i.e., CLIP) and distinguish the distortions with the distortion understanding module. Experimental results have shown the effectiveness of KSVQE on our KVQ database and popular VQA databases.
