Table of Contents
Fetching ...

KVQ: Kwai Video Quality Assessment for Short-form Videos

Yiting Lu, Xin Li, Yajing Pei, Kun Yuan, Qizhi Xie, Yunpeng Qu, Ming Sun, Chao Zhou, Zhibo Chen

TL;DR

KVQ introduces the first large-scale kaleidoscope short-form UGC VQA database (KVQ) and a content-distortion aware evaluator (KSVQE). By integrating a CLIP-based quality-aware Region Selection (QRS), content-adaptive Modulation (CaM), and a distortion-aware Modulation (DaM) guided by CONTRIQUE, KSVQE achieves state-of-the-art performance on KVQ and generalizes well to other UGC-VQA datasets. The KVQ dataset captures realistic short-form creation modes and processing workflows, enabling fine-grained absolute MOS and indistinguishable-pair rankings. This work advances practical VQA for short-form platforms and provides a framework for future content-distortion joint understanding in S-UGC videos.

Abstract

Short-form UGC video platforms, like Kwai and TikTok, have been an emerging and irreplaceable mainstream media form, thriving on user-friendly engagement, and kaleidoscope creation, etc. However, the advancing content-generation modes, e.g., special effects, and sophisticated processing workflows, e.g., de-artifacts, have introduced significant challenges to recent UGC video quality assessment: (i) the ambiguous contents hinder the identification of quality-determined regions. (ii) the diverse and complicated hybrid distortions are hard to distinguish. To tackle the above challenges and assist in the development of short-form videos, we establish the first large-scale Kaleidoscope short Video database for Quality assessment, termed KVQ, which comprises 600 user-uploaded short videos and 3600 processed videos through the diverse practical processing workflows, including pre-processing, transcoding, and enhancement. Among them, the absolute quality score of each video and partial ranking score among indistinguishable samples are provided by a team of professional researchers specializing in image processing. Based on this database, we propose the first short-form video quality evaluator, i.e., KSVQE, which enables the quality evaluator to identify the quality-determined semantics with the content understanding of large vision language models (i.e., CLIP) and distinguish the distortions with the distortion understanding module. Experimental results have shown the effectiveness of KSVQE on our KVQ database and popular VQA databases.

KVQ: Kwai Video Quality Assessment for Short-form Videos

TL;DR

KVQ introduces the first large-scale kaleidoscope short-form UGC VQA database (KVQ) and a content-distortion aware evaluator (KSVQE). By integrating a CLIP-based quality-aware Region Selection (QRS), content-adaptive Modulation (CaM), and a distortion-aware Modulation (DaM) guided by CONTRIQUE, KSVQE achieves state-of-the-art performance on KVQ and generalizes well to other UGC-VQA datasets. The KVQ dataset captures realistic short-form creation modes and processing workflows, enabling fine-grained absolute MOS and indistinguishable-pair rankings. This work advances practical VQA for short-form platforms and provides a framework for future content-distortion joint understanding in S-UGC videos.

Abstract

Short-form UGC video platforms, like Kwai and TikTok, have been an emerging and irreplaceable mainstream media form, thriving on user-friendly engagement, and kaleidoscope creation, etc. However, the advancing content-generation modes, e.g., special effects, and sophisticated processing workflows, e.g., de-artifacts, have introduced significant challenges to recent UGC video quality assessment: (i) the ambiguous contents hinder the identification of quality-determined regions. (ii) the diverse and complicated hybrid distortions are hard to distinguish. To tackle the above challenges and assist in the development of short-form videos, we establish the first large-scale Kaleidoscope short Video database for Quality assessment, termed KVQ, which comprises 600 user-uploaded short videos and 3600 processed videos through the diverse practical processing workflows, including pre-processing, transcoding, and enhancement. Among them, the absolute quality score of each video and partial ranking score among indistinguishable samples are provided by a team of professional researchers specializing in image processing. Based on this database, we propose the first short-form video quality evaluator, i.e., KSVQE, which enables the quality evaluator to identify the quality-determined semantics with the content understanding of large vision language models (i.e., CLIP) and distinguish the distortions with the distortion understanding module. Experimental results have shown the effectiveness of KSVQE on our KVQ database and popular VQA databases.
Paper Structure (46 sections, 7 equations, 12 figures, 16 tables)

This paper contains 46 sections, 7 equations, 12 figures, 16 tables.

Figures (12)

  • Figure 1: The two primary challenges of short-form videos: the kaleidoscope content with various creation modes (top) and complicated distortion arising from sophisticated video processing workflows (bottom). Regions with distortions are indicated by red boxes.
  • Figure 2: The overview for establishing the KVQ dataset involves several key steps. Initially, we collect the original short-form videos to cover the primary creation modes and content scenarios. Subsequently, we make fine-grained video content adjustments based on the 6 video features. Finally, sophisticated video processing workflows are applied to incorporate various hybrid distortions.
  • Figure 3: The MOS distribution of different semantic categories (a) and the histogram of the overall MOS distribution (b).
  • Figure 4: The overall framework of Kaleidoscope Short-form UGC Video Quality Evaluator (KSVQE). It contains quality-aware region selection module (QRS) and content-adaptive modulation (CaM) to incorporating content understanding, and distortion-aware modulation (DaM) to enhance distortion understanding.
  • Figure 5: MOS distribution of videos of the three video groups corresponding to the three video processing workflows.
  • ...and 7 more figures