Feature Denoising Diffusion Model for Blind Image Quality Assessment
Xudong Li, Jingyuan Zheng, Runze Hu, Yan Zhang, Ke Li, Yunhang Shen, Xiawu Zheng, Yutao Liu, ShengChuan Zhang, Pingyang Dai, Rongrong Ji
TL;DR
This paper tackles blind image quality assessment by addressing noise in transfer-learned features through a diffusion-based feature denoising framework named PFD-IQA. It introduces Perceptual Prior Discovery and Aggregation to extract distortion and quality priors via auxiliary tasks and text-conditioned prompts, and Perceptual Prior-based Diffusion Refinement to align features with predefined denoising trajectories using an adaptive noise mechanism and cross-attention guidance. The approach achieves state-of-the-art performance across eight BIQA datasets, including notable gains on KADID ($PLCC=0.935$) and LIVEC ($PLCC=0.922$), while employing a lightweight diffusion model with as few as 5 sampling steps. Overall, PFD-IQA demonstrates that diffusion-based feature denoising, guided by perceptual priors, can robustly improve quality-aware representations and predictions in no-reference image quality assessment.
Abstract
Blind Image Quality Assessment (BIQA) aims to evaluate image quality in line with human perception, without reference benchmarks. Currently, deep learning BIQA methods typically depend on using features from high-level tasks for transfer learning. However, the inherent differences between BIQA and these high-level tasks inevitably introduce noise into the quality-aware features. In this paper, we take an initial step towards exploring the diffusion model for feature denoising in BIQA, namely Perceptual Feature Diffusion for IQA (PFD-IQA), which aims to remove noise from quality-aware features. Specifically, (i) We propose a {Perceptual Prior Discovery and Aggregation module to establish two auxiliary tasks to discover potential low-level features in images that are used to aggregate perceptual text conditions for the diffusion model. (ii) We propose a Perceptual Prior-based Feature Refinement strategy, which matches noisy features to predefined denoising trajectories and then performs exact feature denoising based on text conditions. Extensive experiments on eight standard BIQA datasets demonstrate the superior performance to the state-of-the-art BIQA methods, i.e., achieving the PLCC values of 0.935 ( vs. 0.905 in KADID) and 0.922 ( vs. 0.894 in LIVEC).
