VADB: A Large-Scale Video Aesthetic Database with Professional and Multi-Dimensional Annotations
Qianqian Qiao, DanDan Zheng, Yihang Bo, Bao Peng, Heng Huang, Longteng Jiang, Huaye Wang, Jingdong Chen, Jun Zhou, Xin Jin
TL;DR
This work tackles the lack of large-scale, richly annotated video aesthetics data by introducing VADB, the largest dataset of 10,490 videos annotated by 37 professionals across 11 score dimensions plus language comments and tags. It also presents VADB-Net, a two-stage framework that pre-trains a CLIP-based video encoder with dual text inputs (comments and tags) and a dynamic fusion mechanism, then fine-tunes a regression head for aesthetic scoring. The dataset provides multi-dimensional annotations and rigorous quality control, while the model achieves superior performance over existing video quality assessment baselines and supports downstream aesthetic tasks. The work advances practical video aesthetics research by enabling robust, cross-modal learning and establishing open data and code for reproducibility.
Abstract
Video aesthetic assessment, a vital area in multimedia computing, integrates computer vision with human cognition. Its progress is limited by the lack of standardized datasets and robust models, as the temporal dynamics of video and multimodal fusion challenges hinder direct application of image-based methods. This study introduces VADB, the largest video aesthetic database with 10,490 diverse videos annotated by 37 professionals across multiple aesthetic dimensions, including overall and attribute-specific aesthetic scores, rich language comments and objective tags. We propose VADB-Net, a dual-modal pre-training framework with a two-stage training strategy, which outperforms existing video quality assessment models in scoring tasks and supports downstream video aesthetic assessment tasks. The dataset and source code are available at https://github.com/BestiVictory/VADB.
