Table of Contents
Fetching ...

Unpaired Photo-realistic Image Deraining with Energy-informed Diffusion Model

Yuanbo Wen, Tao Gao, Ting Chen

TL;DR

This work tackles the challenge of unpaired photo-realistic image deraining by introducing UPID-EDM, a diffusion-based framework guided by a dual-consistent energy function. The energy function decomposes into rain-relevance discarding and rain-irrelevance preserving components, informed by learnable domain-representation prompts that exploit CLIP priors. By updating the diffusion score with the gradient of the energy term, the model performs reverse sampling from rainy inputs using a clean-domain diffusion model, yielding high-fidelity, natural derained images without paired data. Empirical results on five benchmarks show state-of-the-art performance in both supervised and no-reference metrics, while ablations validate the contributions of the energy functions, prompts, and starting-time choices. The approach highlights the potential of combining energy guidance with diffusion models for challenging unpaired restoration tasks, albeit with notable computational demands and occasional hallucinations in extreme rain scenarios.

Abstract

Existing unpaired image deraining approaches face challenges in accurately capture the distinguishing characteristics between the rainy and clean domains, resulting in residual degradation and color distortion within the reconstructed images. To this end, we propose an energy-informed diffusion model for unpaired photo-realistic image deraining (UPID-EDM). Initially, we delve into the intricate visual-language priors embedded within the contrastive language-image pre-training model (CLIP), and demonstrate that the CLIP priors aid in the discrimination of rainy and clean images. Furthermore, we introduce a dual-consistent energy function (DEF) that retains the rain-irrelevant characteristics while eliminating the rain-relevant features. This energy function is trained by the non-corresponding rainy and clean images. In addition, we employ the rain-relevance discarding energy function (RDEF) and the rain-irrelevance preserving energy function (RPEF) to direct the reverse sampling procedure of a pre-trained diffusion model, effectively removing the rain streaks while preserving the image contents. Extensive experiments demonstrate that our energy-informed model surpasses the existing unpaired learning approaches in terms of both supervised and no-reference metrics.

Unpaired Photo-realistic Image Deraining with Energy-informed Diffusion Model

TL;DR

This work tackles the challenge of unpaired photo-realistic image deraining by introducing UPID-EDM, a diffusion-based framework guided by a dual-consistent energy function. The energy function decomposes into rain-relevance discarding and rain-irrelevance preserving components, informed by learnable domain-representation prompts that exploit CLIP priors. By updating the diffusion score with the gradient of the energy term, the model performs reverse sampling from rainy inputs using a clean-domain diffusion model, yielding high-fidelity, natural derained images without paired data. Empirical results on five benchmarks show state-of-the-art performance in both supervised and no-reference metrics, while ablations validate the contributions of the energy functions, prompts, and starting-time choices. The approach highlights the potential of combining energy guidance with diffusion models for challenging unpaired restoration tasks, albeit with notable computational demands and occasional hallucinations in extreme rain scenarios.

Abstract

Existing unpaired image deraining approaches face challenges in accurately capture the distinguishing characteristics between the rainy and clean domains, resulting in residual degradation and color distortion within the reconstructed images. To this end, we propose an energy-informed diffusion model for unpaired photo-realistic image deraining (UPID-EDM). Initially, we delve into the intricate visual-language priors embedded within the contrastive language-image pre-training model (CLIP), and demonstrate that the CLIP priors aid in the discrimination of rainy and clean images. Furthermore, we introduce a dual-consistent energy function (DEF) that retains the rain-irrelevant characteristics while eliminating the rain-relevant features. This energy function is trained by the non-corresponding rainy and clean images. In addition, we employ the rain-relevance discarding energy function (RDEF) and the rain-irrelevance preserving energy function (RPEF) to direct the reverse sampling procedure of a pre-trained diffusion model, effectively removing the rain streaks while preserving the image contents. Extensive experiments demonstrate that our energy-informed model surpasses the existing unpaired learning approaches in terms of both supervised and no-reference metrics.
Paper Structure (23 sections, 13 equations, 6 figures, 7 tables, 1 algorithm)

This paper contains 23 sections, 13 equations, 6 figures, 7 tables, 1 algorithm.

Figures (6)

  • Figure 1: Intuitive comparisons of our proposed method and the other existing approaches between the learned perpetual image patch similarity and three image naturalness assessment metrics. Our model achieves the currently best performance in the supervised protocols, while preserving the significant improved naturalness.
  • Figure 2: Overall pipeline of our proposed energy-informed diffusion model for unpaired photo-realistic image deraining (UPID-EDM). This approach employs our developed dual-consistent energy function (DEF) pre-trained on the unpaired rainy and clean images to guide the reverse sampling process of a pre-trained diffusion model. We decompose the energy function into two components, which discard the rain-relevant features and preserve the rain-irrelevant features, respectively.
  • Figure 3: Visual samples of the involved methods on synthetic rainy images. Our proposed approach completely eliminates the rain streaks and generates more photo-realistic derained images, while the visual results of other involved approaches exhibit either notable residual degradation, substantial color distortions, or severe artifacts.
  • Figure 4: Intuitive comparisons of the effectiveness of the proposed dual-consistent energy function. Without our proposed dual-consistency energy function, the textures of reconstructed images are inaccurate and there are many artifacts.
  • Figure 5: Similarity comparisons of different prompts with the given images. Similar texts present significant distances in the latent space, while our learnable prompts achieve more accurate classification.
  • ...and 1 more figures