MDS-VQA: Model-Informed Data Selection for Video Quality Assessment

Jian Zou; Xiaoyu Xu; Zhihua Wang; Yilin Wang; Balu Adsumilli; Kede Ma

MDS-VQA: Model-Informed Data Selection for Video Quality Assessment

Jian Zou, Xiaoyu Xu, Zhihua Wang, Yilin Wang, Balu Adsumilli, Kede Ma

Abstract

Learning-based video quality assessment (VQA) has advanced rapidly, yet progress is increasingly constrained by a disconnect between model design and dataset curation. Model-centric approaches often iterate on fixed benchmarks, while data-centric efforts collect new human labels without systematically targeting the weaknesses of existing VQA models. Here, we describe MDS-VQA, a model-informed data selection mechanism for curating unlabeled videos that are both difficult for the base VQA model and diverse in content. Difficulty is estimated by a failure predictor trained with a ranking objective, and diversity is measured using deep semantic video features, with a greedy procedure balancing the two under a constrained labeling budget. Experiments across multiple VQA datasets and models demonstrate that MDS-VQA identifies diverse, challenging samples that are particularly informative for active fine-tuning. With only a 5% selected subset per target domain, the fine-tuned model improves mean SRCC from 0.651 to 0.722 and achieves the top gMAD rank, indicating strong adaptation and generalization.

MDS-VQA: Model-Informed Data Selection for Video Quality Assessment

Abstract

Paper Structure (17 sections, 9 equations, 7 figures, 10 tables)

This paper contains 17 sections, 9 equations, 7 figures, 10 tables.

Introduction
Related Work
Model-Centric VQA
Data-Centric VQA
Proposed Method: MDS-VQA
Overview and Problem Formulation
Ranking-Based Difficulty Modeling
Model-Informed Selection with Diversity
Subset Labeling and Active Fine-Tuning
Experiments
Experimental Setups
Main Results
Ablation Studies
Conclusion and Future Work
Additional Implementation Details
...and 2 more sections

Figures (7)

Figure 1: System diagram of MDS-VQA. We predict failure (i.e., difficulty) on unlabeled target videos, combine difficulty and content diversity to select a small subset for human labeling, and actively fine-tune the VQA model on existing labeled data and the selected subsets. Bottom: on CGVDS zadtootaghaj2020quality, this $5\%$ subset (marked by red crosses) yields strong failure identification and improved fine-tuning performance.
Figure 2: Training and inference of MDS-VQA. During training, we freeze the base quality model $f(\cdot)$ and optimize an auxiliary failure predictor $g(\cdot)$ by minimizing a fidelity loss under a Thurstone model thurstone1927law. During inference, we rank unlabeled videos by combining predicted difficulty scores with a content diversity measure to select a (sub-)optimal subset for human labeling and active fine-tuning.
Figure 3: Representative gMAD pairs between VQA models induced by MDS-VQA and core-set selection sener2017active. Left: gMAD pairs found by fixing MDS-VQA-induced model predictions and searching for videos that maximally differentiate the core-set-induced model. Right: the reverse setting. Predicted scores and MOSs are shown for each pair, both on a $[1,5]$ scale where higher values indicate better predicted and perceived quality, respectively.
Figure 4: Representative gMAD pairs between VQA models induced by MDS-VQA and ALCS yan2022clustering.
Figure 5: Representative gMAD pairs between VQA models induced by MDS-VQA and FreeSel xie2023towards.
...and 2 more figures

MDS-VQA: Model-Informed Data Selection for Video Quality Assessment

Abstract

MDS-VQA: Model-Informed Data Selection for Video Quality Assessment

Authors

Abstract

Table of Contents

Figures (7)