RL-ScanIQA: Reinforcement-Learned Scanpaths for Blind 360°Image Quality Assessment

Yujia Wang; Yuyan Li; Jiuming Liu; Fang-Lue Zhang; Xinhu Zheng; Neil. A Dodgson

RL-ScanIQA: Reinforcement-Learned Scanpaths for Blind 360°Image Quality Assessment

Yujia Wang, Yuyan Li, Jiuming Liu, Fang-Lue Zhang, Xinhu Zheng, Neil. A Dodgson

Abstract

Blind 360°image quality assessment (IQA) aims to predict perceptual quality for panoramic images without a pristine reference. Unlike conventional planar images, 360°content in immersive environments restricts viewers to a limited viewport at any moment, making viewing behaviors critical to quality perception. Although existing scanpath-based approaches have attempted to model viewing behaviors by approximating the human view-then-rate paradigm, they treat scanpath generation and quality assessment as separate steps, preventing end-to-end optimization and task-aligned exploration. To address this limitation, we propose RL-ScanIQA, a reinforcement-learned framework for blind 360°IQA. RL-ScanIQA optimize a PPO-trained scanpath policy and a quality assessor, where the policy receives quality-driven feedback to learn task-relevant viewing strategies. To improve training stability and prevent mode collapse, we design multi-level rewards, including scanpath diversity and equator-biased priors. We further boost cross-dataset robustness using distortion-space augmentation together with rank-consistent losses that preserve intra-image and inter-image quality orderings. Extensive experiments on three benchmarks show that RL-ScanIQA achieves superior in-dataset performance and cross-dataset generalization. Codes are available at https://github.com/wangyuji1/RLScanIQA.git.

RL-ScanIQA: Reinforcement-Learned Scanpaths for Blind 360°Image Quality Assessment

Abstract

Paper Structure (19 sections, 13 equations, 7 figures, 5 tables)

This paper contains 19 sections, 13 equations, 7 figures, 5 tables.

Introduction
Related Work
Methodology
Overview
Scanpath Generator
Sequential Scanpath Modeling
Design of Rewards
Quality Assessor
Quality Prediction Module
Cross-Domain Enhancement
Experiments
Experimental Setups
Datasets
Implementation Details
Compared Methods
...and 4 more sections

Figures (7)

Figure 1: Comparison of previous works and ours. a: Previous works train the scanpath generator and quality assessor independently, typically with human scanpath supervision. b: Our method requires no human scanpath and jointly optimizes these two modules end-to-end with a reinforcement learning policy.
Figure 2: Overview of RL-ScanIQA: the scanpath generator is trained with PPO policy, sampling K diverse scanpaths. The quality assessor evaluates each scanpath and produces the final quality score.
Figure 3: Rewards: multi-level rewards are designed in our RL policy update. (More details and code in Supplementary Material)
Figure 4: Cross-Domain Enhancement: the quality assessor module is trained using multiple losses with distortion-space augmentation. (More details and code in Supplementary Material)
Figure 5: Sampled discrete viewport sequences. We visualize a representative scanpath (T=7) for each image and show the MOS and our predicted quality score below it.
...and 2 more figures

RL-ScanIQA: Reinforcement-Learned Scanpaths for Blind 360°Image Quality Assessment

Abstract

RL-ScanIQA: Reinforcement-Learned Scanpaths for Blind 360°Image Quality Assessment

Authors

Abstract

Table of Contents

Figures (7)