Table of Contents
Fetching ...

Video Quality Assessment: A Comprehensive Survey

Qi Zheng, Yibo Fan, Leilei Huang, Tianyu Zhu, Jiaming Liu, Zhijian Hao, Shuo Xing, Chia-Ju Chen, Xiongkuo Min, Alan C. Bovik, Zhengzhong Tu

TL;DR

This survey maps the evolution of video quality assessment from foundational NSS-based and SSIM-inspired metrics to modern deep learning and large-model approaches. It contrasts subjective VQA datasets with objective FR/NR methods, detailing architectures that range from knowledge-driven to Transformer and large multimodal models, and discusses loss functions that guide perceptual alignment. The paper highlights performance benchmarks demonstrating the current leaders (e.g., VMAF, STRA-VQA, CLIP-IQA+) across diverse content types, including UGC and AIGC, and identifies practical applications in server-side transcoding, perceptual coding, and live streaming. It further outlines challenges such as data scarcity, temporal modeling, and the need for scalable, efficient models, proposing directions toward multimodal, semi-supervised, and prompt-driven VQA with broader dataset coverage (HDR/60fps/VR) and end-to-end perceptual optimization.

Abstract

Video quality assessment (VQA) is an important processing task, aiming at predicting the quality of videos in a manner highly consistent with human judgments of perceived quality. Traditional VQA models based on natural image and/or video statistics, which are inspired both by models of projected images of the real world and by dual models of the human visual system, deliver only limited prediction performances on real-world user-generated content (UGC), as exemplified in recent large-scale VQA databases containing large numbers of diverse video contents crawled from the web. Fortunately, recent advances in deep neural networks and Large Multimodality Models (LMMs) have enabled significant progress in solving this problem, yielding better results than prior handcrafted models. Numerous deep learning-based VQA models have been developed, with progress in this direction driven by the creation of content-diverse, large-scale human-labeled databases that supply ground truth psychometric video quality data. Here, we present a comprehensive survey of recent progress in the development of VQA algorithms and the benchmarking studies and databases that make them possible. We also analyze open research directions on study design and VQA algorithm architectures. Github link: https://github.com/taco-group/Video-Quality-Assessment-A-Comprehensive-Survey.

Video Quality Assessment: A Comprehensive Survey

TL;DR

This survey maps the evolution of video quality assessment from foundational NSS-based and SSIM-inspired metrics to modern deep learning and large-model approaches. It contrasts subjective VQA datasets with objective FR/NR methods, detailing architectures that range from knowledge-driven to Transformer and large multimodal models, and discusses loss functions that guide perceptual alignment. The paper highlights performance benchmarks demonstrating the current leaders (e.g., VMAF, STRA-VQA, CLIP-IQA+) across diverse content types, including UGC and AIGC, and identifies practical applications in server-side transcoding, perceptual coding, and live streaming. It further outlines challenges such as data scarcity, temporal modeling, and the need for scalable, efficient models, proposing directions toward multimodal, semi-supervised, and prompt-driven VQA with broader dataset coverage (HDR/60fps/VR) and end-to-end perceptual optimization.

Abstract

Video quality assessment (VQA) is an important processing task, aiming at predicting the quality of videos in a manner highly consistent with human judgments of perceived quality. Traditional VQA models based on natural image and/or video statistics, which are inspired both by models of projected images of the real world and by dual models of the human visual system, deliver only limited prediction performances on real-world user-generated content (UGC), as exemplified in recent large-scale VQA databases containing large numbers of diverse video contents crawled from the web. Fortunately, recent advances in deep neural networks and Large Multimodality Models (LMMs) have enabled significant progress in solving this problem, yielding better results than prior handcrafted models. Numerous deep learning-based VQA models have been developed, with progress in this direction driven by the creation of content-diverse, large-scale human-labeled databases that supply ground truth psychometric video quality data. Here, we present a comprehensive survey of recent progress in the development of VQA algorithms and the benchmarking studies and databases that make them possible. We also analyze open research directions on study design and VQA algorithm architectures. Github link: https://github.com/taco-group/Video-Quality-Assessment-A-Comprehensive-Survey.

Paper Structure

This paper contains 75 sections, 13 equations, 11 figures, 7 tables.

Figures (11)

  • Figure 1: Number of publications on image and video quality assessment per year (from Google Scholar).
  • Figure 2: Taxonomy of existing subjective and objective video quality assessment methods.
  • Figure 3: Example of a visual interface used when playing video.
  • Figure 4: Examplar discrete rating scale.
  • Figure 5: Flowchart of a online crowd-sourcing study Telepresence.
  • ...and 6 more figures