NovisVQ: A Streaming Convolutional Neural Network for No-Reference Opinion-Unaware Frame Quality Assessment
Kylie Cancilla, Alexander Moore, Amar Saini, Carmen Carrano
TL;DR
The paper addresses the challenge of video quality assessment (VQA) without clean references or human opinion labels by proposing NovisVQ, a streaming, no-reference, opinion-unaware model. Trained on synthetically degraded DAVIS videos, NovisVQ uses a temporal, multi-scale ResNet encoder with LSTM modules and a lightweight MLP to predict per-frame FR metrics $LPIPS$, $PSNR$, and $SSIM$ directly from degraded video. Compared to an image-based baseline, NovisVQ leverages temporal context to generalize to unseen degradations and real-world motion blur, achieving strong correlations with ground-truth FR metrics on GOPRO data and surpassing BRISQUE for this objective alignment. This work demonstrates scalable, self-supervised VQA suitable for real-time video processing in vision systems, without requiring pristine references or human annotations. It highlights the critical role of temporal modeling in robust VQA and points to future work on downstream task integration and broader degradation modeling.
Abstract
Video quality assessment (VQA) is vital for computer vision tasks, but existing approaches face major limitations: full-reference (FR) metrics require clean reference videos, and most no-reference (NR) models depend on training on costly human opinion labels. Moreover, most opinion-unaware NR methods are image-based, ignoring temporal context critical for video object detection. In this work, we present a scalable, streaming-based VQA model that is both no-reference and opinion-unaware. Our model leverages synthetic degradations of the DAVIS dataset, training a temporal-aware convolutional architecture to predict FR metrics (LPIPS , PSNR, SSIM) directly from degraded video, without references at inference. We show that our streaming approach outperforms our own image-based baseline by generalizing across diverse degradations, underscoring the value of temporal modeling for scalable VQA in real-world vision systems. Additionally, we demonstrate that our model achieves higher correlation with full-reference metrics compared to BRISQUE, a widely-used opinion-aware image quality assessment baseline, validating the effectiveness of our temporal, opinion-unaware approach.
