Table of Contents
Fetching ...

Iterative Predictor-Critic Code Decoding for Real-World Image Dehazing

Jiayi Fu, Siyu Liu, Zikun Liu, Chun-Le Guo, Hyunhee Park, Ruiqi Wu, Guoqing Wang, Chongyi Li

TL;DR

This work tackles real-world image dehazing by addressing the generalization gap of one-shot, codebook-based methods. It introduces IPC-Dehaze, an iterative predictor-critic framework that leverages a pre-trained VQGAN latent codebook to progressively replace high-quality codes across iterations, guided by a Code-Critic that evaluates code interdependencies. Code-Predictor predicts code sequences conditioned on prior iteration codes, while Code-Critic masks less reliable codes to prevent error accumulation, enabling easy-to-hard dehazing. Experiments on RTTS, URHI, and Fattal demonstrate state-of-the-art dehazing quality and robustness across varying haze densities, with ablations validating the importance of both modules and the iterative scheme.

Abstract

We propose a novel Iterative Predictor-Critic Code Decoding framework for real-world image dehazing, abbreviated as IPC-Dehaze, which leverages the high-quality codebook prior encapsulated in a pre-trained VQGAN. Apart from previous codebook-based methods that rely on one-shot decoding, our method utilizes high-quality codes obtained in the previous iteration to guide the prediction of the Code-Predictor in the subsequent iteration, improving code prediction accuracy and ensuring stable dehazing performance. Our idea stems from the observations that 1) the degradation of hazy images varies with haze density and scene depth, and 2) clear regions play crucial cues in restoring dense haze regions. However, it is non-trivial to progressively refine the obtained codes in subsequent iterations, owing to the difficulty in determining which codes should be retained or replaced at each iteration. Another key insight of our study is to propose Code-Critic to capture interrelations among codes. The Code-Critic is used to evaluate code correlations and then resample a set of codes with the highest mask scores, i.e., a higher score indicates that the code is more likely to be rejected, which helps retain more accurate codes and predict difficult ones. Extensive experiments demonstrate the superiority of our method over state-of-the-art methods in real-world dehazing.

Iterative Predictor-Critic Code Decoding for Real-World Image Dehazing

TL;DR

This work tackles real-world image dehazing by addressing the generalization gap of one-shot, codebook-based methods. It introduces IPC-Dehaze, an iterative predictor-critic framework that leverages a pre-trained VQGAN latent codebook to progressively replace high-quality codes across iterations, guided by a Code-Critic that evaluates code interdependencies. Code-Predictor predicts code sequences conditioned on prior iteration codes, while Code-Critic masks less reliable codes to prevent error accumulation, enabling easy-to-hard dehazing. Experiments on RTTS, URHI, and Fattal demonstrate state-of-the-art dehazing quality and robustness across varying haze densities, with ablations validating the importance of both modules and the iterative scheme.

Abstract

We propose a novel Iterative Predictor-Critic Code Decoding framework for real-world image dehazing, abbreviated as IPC-Dehaze, which leverages the high-quality codebook prior encapsulated in a pre-trained VQGAN. Apart from previous codebook-based methods that rely on one-shot decoding, our method utilizes high-quality codes obtained in the previous iteration to guide the prediction of the Code-Predictor in the subsequent iteration, improving code prediction accuracy and ensuring stable dehazing performance. Our idea stems from the observations that 1) the degradation of hazy images varies with haze density and scene depth, and 2) clear regions play crucial cues in restoring dense haze regions. However, it is non-trivial to progressively refine the obtained codes in subsequent iterations, owing to the difficulty in determining which codes should be retained or replaced at each iteration. Another key insight of our study is to propose Code-Critic to capture interrelations among codes. The Code-Critic is used to evaluate code correlations and then resample a set of codes with the highest mask scores, i.e., a higher score indicates that the code is more likely to be rejected, which helps retain more accurate codes and predict difficult ones. Extensive experiments demonstrate the superiority of our method over state-of-the-art methods in real-world dehazing.

Paper Structure

This paper contains 25 sections, 13 equations, 14 figures, 5 tables, 2 algorithms.

Figures (14)

  • Figure 1: A comparison between the state-of-the-art real-world image dehazing methods and our IPC-Dehaze. In comparison, our result is sharper and clearer, with less color distortion and overexposure. The bottom images present the results of each iteration in our method, showing the continuous improvements with our core Predictor-Critic mechanism.
  • Figure 2: Overview of our IPC-Dehaze. In the training phase, we use fused tokens $Z_{t}= Z_l \odot M + Z_c \odot (1-M)$ from the hazy and clean images, and predict the sequence codes $S$ by Code-Predictor. We also train Code-Critic to evaluate each code in set $S$ for potential rejection and resampling. In the inference phase, $Z_{t=0}$ is initially encoded as low-quality tokens $Z_l$. During the $t$-th iterative decoding step, the Code-Predictor takes $Z_t$ as input, predicting the sequence codes $S$ and the corresponding high-quality tokens $Z_c$. To retain the reliable codes and resample the others, the Code-Critic evaluates $S$ and produces a mask map $M$ by $p_\phi$. This mask map $M$ is then used to generate $Z_{t+1}$ through a Fusion process. Following $T$ iterations, $Z_T$ is output to reconstruct the clean image by a decoder. The SFT refers to the Spital Feature Transform, which adjusts the feature within the encoder and decoder.
  • Figure 3: Visual comparison on RTTS. Zoom in for best view.
  • Figure 4: Visual comparison on Fattal. Zoom in for best view.
  • Figure 5: Visual comparison on URHI. Zoom in for best view.
  • ...and 9 more figures