Table of Contents
Fetching ...

Learned Scanpaths Aid Blind Panoramic Video Quality Assessment

Kanglong Fan, Wen Wen, Mu Li, Yifan Peng, Kede Ma

TL;DR

This work tackles blind panoramic video quality assessment (PVQA) by explicitly modeling user viewing behavior through learned visual scanpaths. It introduces a two-module system: a probabilistic scanpath generator that predicts future viewpoints via a Gaussian Mixture Model under historical and causal context, and a quality assessor that evaluates multiple viewport sequences derived from these scanpaths. The scanpath generator is trained in a three-stage pipeline, including self-supervised pretraining on expected code length and subsequent end-to-end finetuning with differentiable viewport sampling, enabling joint optimization with any planar VQA model. Across three public datasets with synthetic and authentic distortions, the approach achieves state-of-the-art SRCC/PLCC in both in-dataset and cross-dataset settings, and demonstrates strong alignment with human viewing patterns while remaining backward compatible with panoramic images. The method is lightweight, differentiable, and broadly compatible with existing QA architectures, offering practical gains for VR/360 streaming quality assessment.

Abstract

Panoramic videos have the advantage of providing an immersive and interactive viewing experience. Nevertheless, their spherical nature gives rise to various and uncertain user viewing behaviors, which poses significant challenges for panoramic video quality assessment (PVQA). In this work, we propose an end-to-end optimized, blind PVQA method with explicit modeling of user viewing patterns through visual scanpaths. Our method consists of two modules: a scanpath generator and a quality assessor. The scanpath generator is initially trained to predict future scanpaths by minimizing their expected code length and then jointly optimized with the quality assessor for quality prediction. Our blind PVQA method enables direct quality assessment of panoramic images by treating them as videos composed of identical frames. Experiments on three public panoramic image and video quality datasets, encompassing both synthetic and authentic distortions, validate the superiority of our blind PVQA model over existing methods.

Learned Scanpaths Aid Blind Panoramic Video Quality Assessment

TL;DR

This work tackles blind panoramic video quality assessment (PVQA) by explicitly modeling user viewing behavior through learned visual scanpaths. It introduces a two-module system: a probabilistic scanpath generator that predicts future viewpoints via a Gaussian Mixture Model under historical and causal context, and a quality assessor that evaluates multiple viewport sequences derived from these scanpaths. The scanpath generator is trained in a three-stage pipeline, including self-supervised pretraining on expected code length and subsequent end-to-end finetuning with differentiable viewport sampling, enabling joint optimization with any planar VQA model. Across three public datasets with synthetic and authentic distortions, the approach achieves state-of-the-art SRCC/PLCC in both in-dataset and cross-dataset settings, and demonstrates strong alignment with human viewing patterns while remaining backward compatible with panoramic images. The method is lightweight, differentiable, and broadly compatible with existing QA architectures, offering practical gains for VR/360 streaming quality assessment.

Abstract

Panoramic videos have the advantage of providing an immersive and interactive viewing experience. Nevertheless, their spherical nature gives rise to various and uncertain user viewing behaviors, which poses significant challenges for panoramic video quality assessment (PVQA). In this work, we propose an end-to-end optimized, blind PVQA method with explicit modeling of user viewing patterns through visual scanpaths. Our method consists of two modules: a scanpath generator and a quality assessor. The scanpath generator is initially trained to predict future scanpaths by minimizing their expected code length and then jointly optimized with the quality assessor for quality prediction. Our blind PVQA method enables direct quality assessment of panoramic images by treating them as videos composed of identical frames. Experiments on three public panoramic image and video quality datasets, encompassing both synthetic and authentic distortions, validate the superiority of our blind PVQA model over existing methods.
Paper Structure (19 sections, 27 equations, 8 figures, 6 tables)

This paper contains 19 sections, 27 equations, 8 figures, 6 tables.

Figures (8)

  • Figure 1: Analogy between human subjects and our end-to-end optimized method for panoramic video quality assessment.
  • Figure 2: Overview of the proposed blind PVQA method, consisting of a scanpath generator and a quality assessor. The basic component of the scanpath generator is the scanpath generation unit (SGU), which utilizes the historical and causal relative scanpaths to produce the GMM parameters for differentiable sampling of the current viewpoint. By assembling $W$ SGUs, we create a scanpath generation block (SGB), which autoregressively predicts a future scanpath of $W$ viewpoints. We further stack $M$ SGBs to generate a long-term scanpath of $M \times W +H$ viewpoints, where $H$ is the length of the initial path. By adjusting the number of initial paths (denoted by $N$), we can sample $N$ scanpaths, along which we produce $N$ viewport sequences as input to the quality assessor.
  • Figure 3: Visualization of a relative scanpath projected from the sphere to the viewport.
  • Figure 4: Comparison of different scanpath predictors in terms of $\mathrm{maxTC}$ with different prediction horizons.
  • Figure 5: Comparison of saliency maps generated from scanpaths by our method and those by humans.
  • ...and 3 more figures