Video Quality Assessment: A Comprehensive Survey
Qi Zheng, Yibo Fan, Leilei Huang, Tianyu Zhu, Jiaming Liu, Zhijian Hao, Shuo Xing, Chia-Ju Chen, Xiongkuo Min, Alan C. Bovik, Zhengzhong Tu
TL;DR
This survey maps the evolution of video quality assessment from foundational NSS-based and SSIM-inspired metrics to modern deep learning and large-model approaches. It contrasts subjective VQA datasets with objective FR/NR methods, detailing architectures that range from knowledge-driven to Transformer and large multimodal models, and discusses loss functions that guide perceptual alignment. The paper highlights performance benchmarks demonstrating the current leaders (e.g., VMAF, STRA-VQA, CLIP-IQA+) across diverse content types, including UGC and AIGC, and identifies practical applications in server-side transcoding, perceptual coding, and live streaming. It further outlines challenges such as data scarcity, temporal modeling, and the need for scalable, efficient models, proposing directions toward multimodal, semi-supervised, and prompt-driven VQA with broader dataset coverage (HDR/60fps/VR) and end-to-end perceptual optimization.
Abstract
Video quality assessment (VQA) is an important processing task, aiming at predicting the quality of videos in a manner highly consistent with human judgments of perceived quality. Traditional VQA models based on natural image and/or video statistics, which are inspired both by models of projected images of the real world and by dual models of the human visual system, deliver only limited prediction performances on real-world user-generated content (UGC), as exemplified in recent large-scale VQA databases containing large numbers of diverse video contents crawled from the web. Fortunately, recent advances in deep neural networks and Large Multimodality Models (LMMs) have enabled significant progress in solving this problem, yielding better results than prior handcrafted models. Numerous deep learning-based VQA models have been developed, with progress in this direction driven by the creation of content-diverse, large-scale human-labeled databases that supply ground truth psychometric video quality data. Here, we present a comprehensive survey of recent progress in the development of VQA algorithms and the benchmarking studies and databases that make them possible. We also analyze open research directions on study design and VQA algorithm architectures. Github link: https://github.com/taco-group/Video-Quality-Assessment-A-Comprehensive-Survey.
