HazeCLIP: Towards Language Guided Real-World Image Dehazing
Ruiyi Wang, Wenhao Li, Xiaohong Liu, Chunyi Li, Zicheng Zhang, Xiongkuo Min, Guangtao Zhai
TL;DR
HazeCLIP addresses the gap between synthetic-trained dehazing models and real-world hazy imagery by introducing a language-guided adaptation framework that uses a frozen CLIP model to steer fine-tuning. It employs region-aware processing via SAM to separate sky and non-sky regions and utilizes three contrastive prompt sets to provide targeted CLIP guidance, including an enhancing set for output quality. A fidelity constraint prevents catastrophic forgetting during fine-tuning, enabling synthetic-to-real generalization and achieving state-of-the-art results on real-world datasets such as RTTS and RESIDE, with demonstrated compatibility across multiple dehazing backbones. The work highlights the value of vision-language priors for low-level restoration tasks and offers an architecture-agnostic, scalable approach to real-world image dehazing.
Abstract
Existing methods have achieved remarkable performance in image dehazing, particularly on synthetic datasets. However, they often struggle with real-world hazy images due to domain shift, limiting their practical applicability. This paper introduces HazeCLIP, a language-guided adaptation framework designed to enhance the real-world performance of pre-trained dehazing networks. Inspired by the Contrastive Language-Image Pre-training (CLIP) model's ability to distinguish between hazy and clean images, we leverage it to evaluate dehazing results. Combined with a region-specific dehazing technique and tailored prompt sets, the CLIP model accurately identifies hazy areas, providing a high-quality, human-like prior that guides the fine-tuning process of pre-trained networks. Extensive experiments demonstrate that HazeCLIP achieves state-of-the-art performance in real-word image dehazing, evaluated through both visual quality and image quality assessment metrics. Codes are available at https://github.com/Troivyn/HazeCLIP.
