Table of Contents
Fetching ...

HazeCLIP: Towards Language Guided Real-World Image Dehazing

Ruiyi Wang, Wenhao Li, Xiaohong Liu, Chunyi Li, Zicheng Zhang, Xiongkuo Min, Guangtao Zhai

TL;DR

HazeCLIP addresses the gap between synthetic-trained dehazing models and real-world hazy imagery by introducing a language-guided adaptation framework that uses a frozen CLIP model to steer fine-tuning. It employs region-aware processing via SAM to separate sky and non-sky regions and utilizes three contrastive prompt sets to provide targeted CLIP guidance, including an enhancing set for output quality. A fidelity constraint prevents catastrophic forgetting during fine-tuning, enabling synthetic-to-real generalization and achieving state-of-the-art results on real-world datasets such as RTTS and RESIDE, with demonstrated compatibility across multiple dehazing backbones. The work highlights the value of vision-language priors for low-level restoration tasks and offers an architecture-agnostic, scalable approach to real-world image dehazing.

Abstract

Existing methods have achieved remarkable performance in image dehazing, particularly on synthetic datasets. However, they often struggle with real-world hazy images due to domain shift, limiting their practical applicability. This paper introduces HazeCLIP, a language-guided adaptation framework designed to enhance the real-world performance of pre-trained dehazing networks. Inspired by the Contrastive Language-Image Pre-training (CLIP) model's ability to distinguish between hazy and clean images, we leverage it to evaluate dehazing results. Combined with a region-specific dehazing technique and tailored prompt sets, the CLIP model accurately identifies hazy areas, providing a high-quality, human-like prior that guides the fine-tuning process of pre-trained networks. Extensive experiments demonstrate that HazeCLIP achieves state-of-the-art performance in real-word image dehazing, evaluated through both visual quality and image quality assessment metrics. Codes are available at https://github.com/Troivyn/HazeCLIP.

HazeCLIP: Towards Language Guided Real-World Image Dehazing

TL;DR

HazeCLIP addresses the gap between synthetic-trained dehazing models and real-world hazy imagery by introducing a language-guided adaptation framework that uses a frozen CLIP model to steer fine-tuning. It employs region-aware processing via SAM to separate sky and non-sky regions and utilizes three contrastive prompt sets to provide targeted CLIP guidance, including an enhancing set for output quality. A fidelity constraint prevents catastrophic forgetting during fine-tuning, enabling synthetic-to-real generalization and achieving state-of-the-art results on real-world datasets such as RTTS and RESIDE, with demonstrated compatibility across multiple dehazing backbones. The work highlights the value of vision-language priors for low-level restoration tasks and offers an architecture-agnostic, scalable approach to real-world image dehazing.

Abstract

Existing methods have achieved remarkable performance in image dehazing, particularly on synthetic datasets. However, they often struggle with real-world hazy images due to domain shift, limiting their practical applicability. This paper introduces HazeCLIP, a language-guided adaptation framework designed to enhance the real-world performance of pre-trained dehazing networks. Inspired by the Contrastive Language-Image Pre-training (CLIP) model's ability to distinguish between hazy and clean images, we leverage it to evaluate dehazing results. Combined with a region-specific dehazing technique and tailored prompt sets, the CLIP model accurately identifies hazy areas, providing a high-quality, human-like prior that guides the fine-tuning process of pre-trained networks. Extensive experiments demonstrate that HazeCLIP achieves state-of-the-art performance in real-word image dehazing, evaluated through both visual quality and image quality assessment metrics. Codes are available at https://github.com/Troivyn/HazeCLIP.
Paper Structure (13 sections, 4 equations, 4 figures, 3 tables)

This paper contains 13 sections, 4 equations, 4 figures, 3 tables.

Figures (4)

  • Figure 1: CLIP model is capable of distinguishing between hazy and clean images. Classification probabilities for example images and average accuracy for hazy and clean image sets are reported.
  • Figure 2: Overview of the proposed HazeCLIP framework. Real-world hazy images are first separated into sky and non-sky regions using Segment Anything Model (SAM). Combined with the CLIP model, three contrastive prompt sets are applied to guide the adaptation process. The enhancing prompt set aims to improve overall image quality, and non-sky and sky dehazing prompt sets specifically guide dehazing in their respective regions.
  • Figure 3: Language-image similarity maps for hazy images. By removing the sky, the CLIP model can focus more effectively on scene dehazing. (a) Hazy images, (b) Raw similarity maps, (c) Maps of CLIP surgery clip_surgery, (d) Maps after sky masking.
  • Figure 4: Visual comparisons on RTTS RESIDE dataset. For better clarity, the region within the green rectangle is zoomed in and displayed in the top right corner.