Table of Contents
Fetching ...

CLIP-DQA: Blindly Evaluating Dehazed Images from Global and Local Perspectives Using CLIP

Yirui Zeng, Jun Fu, Hadi Amirpour, Huasheng Wang, Guanghui Yue, Hantao Liu, Ying Chen, Wei Zhou

TL;DR

This work tackles blind dehazed image quality assessment (BDQA) by adapting CLIP to BDQA and leveraging both global and local image information. It introduces multi-modal prompt tuning to map CLIP representations to a dehazed image quality score, enabling effective zero-shot evaluation and further improvement through learned textual and visual prompts. The method demonstrates state-of-the-art correlations on authentic datasets (DHQ and exBeDDE) and provides valuable ablations showing the benefits of hierarchical perception and prompt-based adaptation. The approach offers a scalable, reference-free framework for evaluating dehazing outputs with practical impact for algorithm comparison and optimization, with code made available for reproducibility.

Abstract

Blind dehazed image quality assessment (BDQA), which aims to accurately predict the visual quality of dehazed images without any reference information, is essential for the evaluation, comparison, and optimization of image dehazing algorithms. Existing learning-based BDQA methods have achieved remarkable success, while the small scale of DQA datasets limits their performance. To address this issue, in this paper, we propose to adapt Contrastive Language-Image Pre-Training (CLIP), pre-trained on large-scale image-text pairs, to the BDQA task. Specifically, inspired by the fact that the human visual system understands images based on hierarchical features, we take global and local information of the dehazed image as the input of CLIP. To accurately map the input hierarchical information of dehazed images into the quality score, we tune both the vision branch and language branch of CLIP with prompt learning. Experimental results on two authentic DQA datasets demonstrate that our proposed approach, named CLIP-DQA, achieves more accurate quality predictions over existing BDQA methods. The code is available at https://github.com/JunFu1995/CLIP-DQA.

CLIP-DQA: Blindly Evaluating Dehazed Images from Global and Local Perspectives Using CLIP

TL;DR

This work tackles blind dehazed image quality assessment (BDQA) by adapting CLIP to BDQA and leveraging both global and local image information. It introduces multi-modal prompt tuning to map CLIP representations to a dehazed image quality score, enabling effective zero-shot evaluation and further improvement through learned textual and visual prompts. The method demonstrates state-of-the-art correlations on authentic datasets (DHQ and exBeDDE) and provides valuable ablations showing the benefits of hierarchical perception and prompt-based adaptation. The approach offers a scalable, reference-free framework for evaluating dehazing outputs with practical impact for algorithm comparison and optimization, with code made available for reproducibility.

Abstract

Blind dehazed image quality assessment (BDQA), which aims to accurately predict the visual quality of dehazed images without any reference information, is essential for the evaluation, comparison, and optimization of image dehazing algorithms. Existing learning-based BDQA methods have achieved remarkable success, while the small scale of DQA datasets limits their performance. To address this issue, in this paper, we propose to adapt Contrastive Language-Image Pre-Training (CLIP), pre-trained on large-scale image-text pairs, to the BDQA task. Specifically, inspired by the fact that the human visual system understands images based on hierarchical features, we take global and local information of the dehazed image as the input of CLIP. To accurately map the input hierarchical information of dehazed images into the quality score, we tune both the vision branch and language branch of CLIP with prompt learning. Experimental results on two authentic DQA datasets demonstrate that our proposed approach, named CLIP-DQA, achieves more accurate quality predictions over existing BDQA methods. The code is available at https://github.com/JunFu1995/CLIP-DQA.

Paper Structure

This paper contains 12 sections, 8 equations, 2 figures, 2 tables.

Figures (2)

  • Figure 1: Illustration of multi-modal prompt tuning for blind dehazed image quality assessment. "FC" means the fully connected layer. We use "Good photo." and "Bad photo." as antonym text prompts following the CLIP-based general image quality assessment method wang2023exploring.
  • Figure 2: Illustration of the average attention map for the last visual transformer layer. The first and third rows are resized versions and patches of six dehazed images, respectively. The second and fourth rows visualize attention on resized images and patches, respectively. Each column corresponds to a dehazed image.