Learned Scanpaths Aid Blind Panoramic Video Quality Assessment
Kanglong Fan, Wen Wen, Mu Li, Yifan Peng, Kede Ma
TL;DR
This work tackles blind panoramic video quality assessment (PVQA) by explicitly modeling user viewing behavior through learned visual scanpaths. It introduces a two-module system: a probabilistic scanpath generator that predicts future viewpoints via a Gaussian Mixture Model under historical and causal context, and a quality assessor that evaluates multiple viewport sequences derived from these scanpaths. The scanpath generator is trained in a three-stage pipeline, including self-supervised pretraining on expected code length and subsequent end-to-end finetuning with differentiable viewport sampling, enabling joint optimization with any planar VQA model. Across three public datasets with synthetic and authentic distortions, the approach achieves state-of-the-art SRCC/PLCC in both in-dataset and cross-dataset settings, and demonstrates strong alignment with human viewing patterns while remaining backward compatible with panoramic images. The method is lightweight, differentiable, and broadly compatible with existing QA architectures, offering practical gains for VR/360 streaming quality assessment.
Abstract
Panoramic videos have the advantage of providing an immersive and interactive viewing experience. Nevertheless, their spherical nature gives rise to various and uncertain user viewing behaviors, which poses significant challenges for panoramic video quality assessment (PVQA). In this work, we propose an end-to-end optimized, blind PVQA method with explicit modeling of user viewing patterns through visual scanpaths. Our method consists of two modules: a scanpath generator and a quality assessor. The scanpath generator is initially trained to predict future scanpaths by minimizing their expected code length and then jointly optimized with the quality assessor for quality prediction. Our blind PVQA method enables direct quality assessment of panoramic images by treating them as videos composed of identical frames. Experiments on three public panoramic image and video quality datasets, encompassing both synthetic and authentic distortions, validate the superiority of our blind PVQA model over existing methods.
