Feature Denoising Diffusion Model for Blind Image Quality Assessment

Xudong Li; Jingyuan Zheng; Runze Hu; Yan Zhang; Ke Li; Yunhang Shen; Xiawu Zheng; Yutao Liu; ShengChuan Zhang; Pingyang Dai; Rongrong Ji

Feature Denoising Diffusion Model for Blind Image Quality Assessment

Xudong Li, Jingyuan Zheng, Runze Hu, Yan Zhang, Ke Li, Yunhang Shen, Xiawu Zheng, Yutao Liu, ShengChuan Zhang, Pingyang Dai, Rongrong Ji

TL;DR

This paper tackles blind image quality assessment by addressing noise in transfer-learned features through a diffusion-based feature denoising framework named PFD-IQA. It introduces Perceptual Prior Discovery and Aggregation to extract distortion and quality priors via auxiliary tasks and text-conditioned prompts, and Perceptual Prior-based Diffusion Refinement to align features with predefined denoising trajectories using an adaptive noise mechanism and cross-attention guidance. The approach achieves state-of-the-art performance across eight BIQA datasets, including notable gains on KADID ($PLCC=0.935$) and LIVEC ($PLCC=0.922$), while employing a lightweight diffusion model with as few as 5 sampling steps. Overall, PFD-IQA demonstrates that diffusion-based feature denoising, guided by perceptual priors, can robustly improve quality-aware representations and predictions in no-reference image quality assessment.

Abstract

Blind Image Quality Assessment (BIQA) aims to evaluate image quality in line with human perception, without reference benchmarks. Currently, deep learning BIQA methods typically depend on using features from high-level tasks for transfer learning. However, the inherent differences between BIQA and these high-level tasks inevitably introduce noise into the quality-aware features. In this paper, we take an initial step towards exploring the diffusion model for feature denoising in BIQA, namely Perceptual Feature Diffusion for IQA (PFD-IQA), which aims to remove noise from quality-aware features. Specifically, (i) We propose a {Perceptual Prior Discovery and Aggregation module to establish two auxiliary tasks to discover potential low-level features in images that are used to aggregate perceptual text conditions for the diffusion model. (ii) We propose a Perceptual Prior-based Feature Refinement strategy, which matches noisy features to predefined denoising trajectories and then performs exact feature denoising based on text conditions. Extensive experiments on eight standard BIQA datasets demonstrate the superior performance to the state-of-the-art BIQA methods, i.e., achieving the PLCC values of 0.935 ( vs. 0.905 in KADID) and 0.922 ( vs. 0.894 in LIVEC).

Feature Denoising Diffusion Model for Blind Image Quality Assessment

TL;DR

) and LIVEC (

), while employing a lightweight diffusion model with as few as 5 sampling steps. Overall, PFD-IQA demonstrates that diffusion-based feature denoising, guided by perceptual priors, can robustly improve quality-aware representations and predictions in no-reference image quality assessment.

Abstract

Paper Structure (16 sections, 13 equations, 4 figures, 5 tables)

This paper contains 16 sections, 13 equations, 4 figures, 5 tables.

Introduction
Related Work
BIQA with Deep Learning.
Diffusion Models.
Methodology
Overview
Perceptual Prior Discovery and Aggregation
Perceptual Prior-based Diffusion Refinement
Experiments
Benchmark Datasets and Evaluation Protocols
Implementation Details
Overall Prediction Performance Comparison
Generalization Capability Validation
Qualitative Analysis
Ablation Study
...and 1 more sections

Figures (4)

Figure 1: Image on top: the sample image. Images at bottom: Before and after diffusion denoising, the feature map significantly refines, effectively pinpointing areas with visible image quality degradation. The initial semantic focus is on "human," but after denoising, attention notably shifts to the fuzzy region (the orange region with the blurred crowd and arms), resulting in a closer alignment with the actual Mean Opinion Scores (MOS).
Figure 2: The overview of PFD-IQA, which consists of a teacher model used for creating pseudo-labels and a student model equipped with PDA and PDR modules. Specifically, we begin by developing a learning perceptual prior (Sec. \ref{['PDA']}) through the random mask reconstruction process. Subsequently, we use the prior knowledge to aggregate text information as the condition to guide the feature-denoising process of the diffusion model and refine the features (Sec. \ref{['PDR']}).
Figure 3: The predefined denoising trajectory starts with a teacher pseudo-feature label for forward diffusion. During each reverse denoising phase, image and text information are fused to accurately predict the noise in the features. For student denoising, the noise level matched by the noise alignment mechanism is used as the input for noise prediction.
Figure 4: Visualization of the feature of DEIQT and PFD-IQA.

Feature Denoising Diffusion Model for Blind Image Quality Assessment

TL;DR

Abstract

Feature Denoising Diffusion Model for Blind Image Quality Assessment

Authors

TL;DR

Abstract

Table of Contents

Figures (4)