Light-VQA+: A Video Quality Assessment Model for Exposure Correction with Vision-Language Guidance
Xunchu Zhou, Xiaohong Liu, Yunlong Dong, Tengchuan Kou, Yixuan Gao, Zicheng Zhang, Chunyi Li, Haoning Wu, Guangtao Zhai
TL;DR
This work targets the specialized problem of judging video quality after exposure correction in user-generated content. It introduces VEC-QA, a dataset combining LLVE-QA and OEVR-QA to cover low-light and over-exposure recovery, and presents Light-VQA+, a CLIP-guided VQA model that fuses spatial and temporal cues via cross-attention and applies Human Visual System (HVS) inspired weighting to produce a final quality score. Light-VQA+ demonstrates superior correlation with human perception across VEC-QA and public benchmarks, and ablations verify the contributions of CLIP-based brightness/noise features, temporal brightness consistency, cross-attention fusion, and HVS weighting. The model also proves useful for improving exposure-correction algorithms, as shown by fine-tuning FEC-Net with Light-VQA+-guided supervision. Overall, Light-VQA+ offers a specialized, perceptually aligned metric to advance exposure correction methods for videos and supports broader evaluation and development of VEC algorithms.
Abstract
Recently, User-Generated Content (UGC) videos have gained popularity in our daily lives. However, UGC videos often suffer from poor exposure due to the limitations of photographic equipment and techniques. Therefore, Video Exposure Correction (VEC) algorithms have been proposed, Low-Light Video Enhancement (LLVE) and Over-Exposed Video Recovery (OEVR) included. Equally important to the VEC is the Video Quality Assessment (VQA). Unfortunately, almost all existing VQA models are built generally, measuring the quality of a video from a comprehensive perspective. As a result, Light-VQA, trained on LLVE-QA, is proposed for assessing LLVE. We extend the work of Light-VQA by expanding the LLVE-QA dataset into Video Exposure Correction Quality Assessment (VEC-QA) dataset with over-exposed videos and their corresponding corrected versions. In addition, we propose Light-VQA+, a VQA model specialized in assessing VEC. Light-VQA+ differs from Light-VQA mainly from the usage of the CLIP model and the vision-language guidance during the feature extraction, followed by a new module referring to the Human Visual System (HVS) for more accurate assessment. Extensive experimental results show that our model achieves the best performance against the current State-Of-The-Art (SOTA) VQA models on the VEC-QA dataset and other public datasets.
