Table of Contents
Fetching ...

SelfPromer: Self-Prompt Dehazing Transformers with Depth-Consistency

Cong Wang, Jinshan Pan, Wanyu Lin, Jiangxin Dong, Xiao-Ming Wu

TL;DR

This work tackles image dehazing by enforcing depth-consistency between hazy inputs and their clear counterparts through a depth-difference prompted Transformer. It introduces a depth-based prompt, a prompt embedding module, and a prompt attention mechanism within a VQGAN-based encoder-decoder, plus a mutual deformable fusion module and a continuous self-prompt inference scheme that iteratively refines haze removal. Key contributions include formulating the depth-difference prompt, integrating prompt-aware features into a Transformer, and demonstrating superior perceptual quality on synthetic and real-world datasets via NIQE, PI, and PIQE. The approach offers a practical path to more natural hazy-image restoration, with potential implications for real-time dehazing and depth-aware vision tasks, though it relies on depth estimates and shows some limitations on outdoor scenes with complex haze.

Abstract

This work presents an effective depth-consistency self-prompt Transformer for image dehazing. It is motivated by an observation that the estimated depths of an image with haze residuals and its clear counterpart vary. Enforcing the depth consistency of dehazed images with clear ones, therefore, is essential for dehazing. For this purpose, we develop a prompt based on the features of depth differences between the hazy input images and corresponding clear counterparts that can guide dehazing models for better restoration. Specifically, we first apply deep features extracted from the input images to the depth difference features for generating the prompt that contains the haze residual information in the input. Then we propose a prompt embedding module that is designed to perceive the haze residuals, by linearly adding the prompt to the deep features. Further, we develop an effective prompt attention module to pay more attention to haze residuals for better removal. By incorporating the prompt, prompt embedding, and prompt attention into an encoder-decoder network based on VQGAN, we can achieve better perception quality. As the depths of clear images are not available at inference, and the dehazed images with one-time feed-forward execution may still contain a portion of haze residuals, we propose a new continuous self-prompt inference that can iteratively correct the dehazing model towards better haze-free image generation. Extensive experiments show that our method performs favorably against the state-of-the-art approaches on both synthetic and real-world datasets in terms of perception metrics including NIQE, PI, and PIQE.

SelfPromer: Self-Prompt Dehazing Transformers with Depth-Consistency

TL;DR

This work tackles image dehazing by enforcing depth-consistency between hazy inputs and their clear counterparts through a depth-difference prompted Transformer. It introduces a depth-based prompt, a prompt embedding module, and a prompt attention mechanism within a VQGAN-based encoder-decoder, plus a mutual deformable fusion module and a continuous self-prompt inference scheme that iteratively refines haze removal. Key contributions include formulating the depth-difference prompt, integrating prompt-aware features into a Transformer, and demonstrating superior perceptual quality on synthetic and real-world datasets via NIQE, PI, and PIQE. The approach offers a practical path to more natural hazy-image restoration, with potential implications for real-time dehazing and depth-aware vision tasks, though it relies on depth estimates and shows some limitations on outdoor scenes with complex haze.

Abstract

This work presents an effective depth-consistency self-prompt Transformer for image dehazing. It is motivated by an observation that the estimated depths of an image with haze residuals and its clear counterpart vary. Enforcing the depth consistency of dehazed images with clear ones, therefore, is essential for dehazing. For this purpose, we develop a prompt based on the features of depth differences between the hazy input images and corresponding clear counterparts that can guide dehazing models for better restoration. Specifically, we first apply deep features extracted from the input images to the depth difference features for generating the prompt that contains the haze residual information in the input. Then we propose a prompt embedding module that is designed to perceive the haze residuals, by linearly adding the prompt to the deep features. Further, we develop an effective prompt attention module to pay more attention to haze residuals for better removal. By incorporating the prompt, prompt embedding, and prompt attention into an encoder-decoder network based on VQGAN, we can achieve better perception quality. As the depths of clear images are not available at inference, and the dehazed images with one-time feed-forward execution may still contain a portion of haze residuals, we propose a new continuous self-prompt inference that can iteratively correct the dehazing model towards better haze-free image generation. Extensive experiments show that our method performs favorably against the state-of-the-art approaches on both synthetic and real-world datasets in terms of perception metrics including NIQE, PI, and PIQE.
Paper Structure (12 sections, 12 equations, 12 figures, 7 tables)

This paper contains 12 sections, 12 equations, 12 figures, 7 tables.

Figures (12)

  • Figure 1: Haze residuals pose a significant challenge to accurately estimating the depth of clear images, creating inconsistencies compared to hazy images. A difference map (e) is utilized to locate haze residuals on the estimated depth, while minimal haze residuals will result in consistent estimates. By analyzing the difference map, we can identify the impact of haze residuals, leading to the development of improved dehazing models to mitigate this effect and enhance the quality of dehazed images. The difference map (e) is derived by $|\text{hazy~depth}-\text{clear~depth}|$ with equalization for better visualization.
  • Figure 2: The proposed framework at training stage. MDFM is detailed in Sec. \ref{['sec:Multual deformable fusion module']}. The inference process is illustrated in Fig. \ref{['fig: Continuous Depth-Consistency Self-Prompt Transformers at testing stage']}.
  • Figure 3: Continuous Self-Prompt Inference. $i^{\text{th}}$ prompt inference contains four steps: Sequential execution from top to bottom. Step 1 obtains clearer images to participate in forming the prompt by feeding the hazy image itself to our network without prompt by setting $\mathbf{F}_{\text{D}_{\text{diff}}}$ as zero. Step 2 generates the prompt to guide the dehazing model. Step 3 conducts the self-prompt dehazing to produce the results. Step 4 updates for the next iterative dehazing. The magenta line describes the 'self' process that builds the prompt from the hazy image itself. Here, Dehazing Transformer means our Self-Prompt Dehazing Transformer with $\mathbf{F}_{\text{D}_{\text{diff}}}=0$.
  • Figure 4: (a)-(b) Existing position embedding vs. Prompt embedding (Ours). Our prompt embedding can better perceive the haze information and is friendly for different input sizes. (c)-(d) Existing regular attention vs. Prompt attention (Ours). Our prompt attention can pay more attention to the haze residuals.
  • Figure 5: Continuous self-prompt inference vs. GT guidance (Baseline) on the SOTS-indoor dataset. GT guidance means we use the GT image to participate in forming the prompt at inference like the process of the training stage, which serves as the baseline. Due to the one-time feed-forward execution that may still contain a portion of haze residuals, continuously conducting inference can ensure the results toward better haze-free image generation. Note GT guidance only conducts one-time inference to generate a result. What's more, GT is not available in the real world. More detailed explanations are given in Sec. \ref{['sec:Analysis and Discussion']}.
  • ...and 7 more figures