No-Reference Image Quality Assessment with Global-Local Progressive Integration and Semantic-Aligned Quality Transfer
Xiaoqi Wang, Yun Zhang
TL;DR
This work tackles no-reference image quality assessment by combining a Vision Transformer-based global feature extractor with a CNN-based local feature extractor in a global-local progressive integration framework (GlintIQA). To address data scarcity and content diversity, it introduces the semantic-aligned quality transfer (SAQT) method and the SAQT-IQA dataset, enabling semantically aware label transfer for degraded images. Empirical results show state-of-the-art or competitive performance on authentic and synthetic distortion benchmarks, with significant cross-dataset gains, especially when SAQT pretraining is used. Overall, the paper demonstrates that dual-stream feature fusion plus content-aware data augmentation yields robust NR-IQA with strong generalization across distortion types and content, advancing practical image quality assessment in real-world deployments.
Abstract
Accurate measurement of image quality without reference signals remains a fundamental challenge in low-level visual perception applications. In this paper, we propose a global-local progressive integration model that addresses this challenge through three key contributions: 1) We develop a dual-measurement framework that combines vision Transformer (ViT)-based global feature extractor and convolutional neural networks (CNNs)-based local feature extractor to comprehensively capture and quantify image distortion characteristics at different granularities. 2) We propose a progressive feature integration scheme that utilizes multi-scale kernel configurations to align global and local features, and progressively aggregates them via an interactive stack of channel-wise self-attention and spatial interaction modules for multi-grained quality-aware representations. 3) We introduce a semantic-aligned quality transfer method that extends the training data by automatically labeling the quality scores of diverse image content with subjective opinion scores. Experimental results demonstrate that our model yields 5.04% and 5.40% improvements in Spearman's rank-order correlation coefficient (SROCC) for cross-authentic and cross-synthetic dataset generalization tests, respectively. Furthermore, the proposed semantic-aligned quality transfer further yields 2.26% and 13.23% performance gains in evaluations on single-synthetic and cross-synthetic datasets.
