Table of Contents
Fetching ...

Bilevel Layer-Positioning LoRA for Real Image Dehazing

Yan Zhang, Long Ma, Yuxin Feng, Zhe Huang, Fan Zhou, Zhuo Su

TL;DR

The haze-to-clear text-directed loss that leverages CLIP's cross-modal capabilities to reformulate real image dehazing as a semantic alignment problem in latent space, thereby providing explicit unsupervised cross-modal guidance in the absence of reference images is proposed.

Abstract

Learning-based real image dehazing methods have achieved notable progress, yet they still face adaptation challenges in diverse real haze scenes. These challenges mainly stem from the lack of effective unsupervised mechanisms for unlabeled data and the heavy cost of full model fine-tuning. To address these challenges, we propose the haze-to-clear text-directed loss that leverages CLIP's cross-modal capabilities to reformulate real image dehazing as a semantic alignment problem in latent space, thereby providing explicit unsupervised cross-modal guidance in the absence of reference images. Furthermore, we introduce the Bilevel Layer-positioning LoRA (BiLaLoRA) strategy, which learns both the LoRA parameters and automatically search the injection layers, enabling targeted adaptation of critical network layers. Extensive experiments demonstrate our superiority against state-of-the-art methods on multiple real-world dehazing benchmarks. The code is publicly available at https://github.com/YanZhang-zy/BiLaLoRA.

Bilevel Layer-Positioning LoRA for Real Image Dehazing

TL;DR

The haze-to-clear text-directed loss that leverages CLIP's cross-modal capabilities to reformulate real image dehazing as a semantic alignment problem in latent space, thereby providing explicit unsupervised cross-modal guidance in the absence of reference images is proposed.

Abstract

Learning-based real image dehazing methods have achieved notable progress, yet they still face adaptation challenges in diverse real haze scenes. These challenges mainly stem from the lack of effective unsupervised mechanisms for unlabeled data and the heavy cost of full model fine-tuning. To address these challenges, we propose the haze-to-clear text-directed loss that leverages CLIP's cross-modal capabilities to reformulate real image dehazing as a semantic alignment problem in latent space, thereby providing explicit unsupervised cross-modal guidance in the absence of reference images. Furthermore, we introduce the Bilevel Layer-positioning LoRA (BiLaLoRA) strategy, which learns both the LoRA parameters and automatically search the injection layers, enabling targeted adaptation of critical network layers. Extensive experiments demonstrate our superiority against state-of-the-art methods on multiple real-world dehazing benchmarks. The code is publicly available at https://github.com/YanZhang-zy/BiLaLoRA.
Paper Structure (26 sections, 7 equations, 13 figures, 3 tables, 1 algorithm)

This paper contains 26 sections, 7 equations, 13 figures, 3 tables, 1 algorithm.

Figures (13)

  • Figure 1: Performance comparison. Sub-figure (a) shows quantitative results on four non-reference metrics across three real-world datasets (RTTS reside, URHI reside, and Fattal fattal), while sub-figure (b) presents visual comparisons on different challenging real scenes.
  • Figure 2: Effectiveness of H2C loss in different real-scenes. The left two rows (daytime scenes) are from URHI and Fattal, and the right two rows (nighttime scenes) are from NHRW NHRW.
  • Figure 3: Contribution of different network components to domain adaptation.
  • Figure 4: Quantitative results of cross-model flexibility. We evaluate four baseline dehazing architectures on three real dehazing datasets with four non-reference metrics to ensure the generality of this property.
  • Figure 5: Quantitative results of cross-domain stability. We leverage DEA pre-trained on four synthetic datasets to verify robustness across different source domains.
  • ...and 8 more figures