Table of Contents
Fetching ...

Analysis of Video Quality Datasets via Design of Minimalistic Video Quality Models

Wei Sun, Wen Wen, Xiongkuo Min, Long Lan, Guangtao Zhai, Kede Ma

TL;DR

The paper analyzes BVQA progress by constructing a family of minimalistic BVQA models and evaluating them across eight VQA datasets to reveal the easy dataset problem. It decomposes the model into four blocks—video preprocessor, spatial quality analyzer, optional temporal quality analyzer, and regressor—and shows that many datasets are dominated by spatial distortions, with temporal modeling providing limited gains except for datasets with real temporal artifacts. Through extensive intra- and cross-dataset experiments and ablations, the study ranks datasets by difficulty, demonstrates the impact of initialization and backbone choice, and highlights the need for better dataset construction and evaluation protocols. The findings offer concrete baselines and design guidelines to drive more meaningful BVQA progress and dataset development.

Abstract

Blind video quality assessment (BVQA) plays an indispensable role in monitoring and improving the end-users' viewing experience in various real-world video-enabled media applications. As an experimental field, the improvements of BVQA models have been measured primarily on a few human-rated VQA datasets. Thus, it is crucial to gain a better understanding of existing VQA datasets in order to properly evaluate the current progress in BVQA. Towards this goal, we conduct a first-of-its-kind computational analysis of VQA datasets via designing minimalistic BVQA models. By minimalistic, we restrict our family of BVQA models to build only upon basic blocks: a video preprocessor (for aggressive spatiotemporal downsampling), a spatial quality analyzer, an optional temporal quality analyzer, and a quality regressor, all with the simplest possible instantiations. By comparing the quality prediction performance of different model variants on eight VQA datasets with realistic distortions, we find that nearly all datasets suffer from the easy dataset problem of varying severity, some of which even admit blind image quality assessment (BIQA) solutions. We additionally justify our claims by contrasting our model generalizability on these VQA datasets, and by ablating a dizzying set of BVQA design choices related to the basic building blocks. Our results cast doubt on the current progress in BVQA, and meanwhile shed light on good practices of constructing next-generation VQA datasets and models.

Analysis of Video Quality Datasets via Design of Minimalistic Video Quality Models

TL;DR

The paper analyzes BVQA progress by constructing a family of minimalistic BVQA models and evaluating them across eight VQA datasets to reveal the easy dataset problem. It decomposes the model into four blocks—video preprocessor, spatial quality analyzer, optional temporal quality analyzer, and regressor—and shows that many datasets are dominated by spatial distortions, with temporal modeling providing limited gains except for datasets with real temporal artifacts. Through extensive intra- and cross-dataset experiments and ablations, the study ranks datasets by difficulty, demonstrates the impact of initialization and backbone choice, and highlights the need for better dataset construction and evaluation protocols. The findings offer concrete baselines and design guidelines to drive more meaningful BVQA progress and dataset development.

Abstract

Blind video quality assessment (BVQA) plays an indispensable role in monitoring and improving the end-users' viewing experience in various real-world video-enabled media applications. As an experimental field, the improvements of BVQA models have been measured primarily on a few human-rated VQA datasets. Thus, it is crucial to gain a better understanding of existing VQA datasets in order to properly evaluate the current progress in BVQA. Towards this goal, we conduct a first-of-its-kind computational analysis of VQA datasets via designing minimalistic BVQA models. By minimalistic, we restrict our family of BVQA models to build only upon basic blocks: a video preprocessor (for aggressive spatiotemporal downsampling), a spatial quality analyzer, an optional temporal quality analyzer, and a quality regressor, all with the simplest possible instantiations. By comparing the quality prediction performance of different model variants on eight VQA datasets with realistic distortions, we find that nearly all datasets suffer from the easy dataset problem of varying severity, some of which even admit blind image quality assessment (BIQA) solutions. We additionally justify our claims by contrasting our model generalizability on these VQA datasets, and by ablating a dizzying set of BVQA design choices related to the basic building blocks. Our results cast doubt on the current progress in BVQA, and meanwhile shed light on good practices of constructing next-generation VQA datasets and models.
Paper Structure (25 sections, 9 equations, 4 figures, 10 tables)

This paper contains 25 sections, 9 equations, 4 figures, 10 tables.

Figures (4)

  • Figure 1: Scatter plots of spatial information (SI) versus temporal information (TI) of the eight VQA datasets with the corresponding convex hulls. Each dot represents a video.
  • Figure 2: Illustration of the video preprocessor. In this example, the frame number, $N$, and the frame rate, $R$, of the video $\bm{x}$ are $40$ and $10$ fps, respectively. The frame number $T$ of each video chunk is $5$. For brevity, we omit the procedure of spatial downsampling.
  • Figure 3: Performance on the LSVQ test sets by varying different spatial resolutions and temporal sampling rates for key frames under Model (i@). The legends in (a) and (b) represent the input spatial resolutions, and the legends in (c) and (d) represent the temporal sampling rate for key frames.
  • Figure 4: Performance on the LSVQ test sets by varying different temporal sampling rates (in the legends) for video chunks under Model (iv@).