Table of Contents
Fetching ...

Cycle Contrastive Adversarial Learning for Unsupervised image Deraining

Chen Zhao, Weiling Cai, ChengWei Hu, Zheng Yuan

TL;DR

This work tackles unsupervised single image deraining by introducing CCLGAN, a cycle contrastive learning framework that combines cycle-contrastive and location-contrastive losses to disentangle rain from content. It leverages a CLIP-based semantic latent space (intra-CCL) and a discriminant latent space derived from discriminator encoders (inter-CCL), guided by adversarial losses and a robust mutual-information-like constraint (LCL) to preserve content. The model uses two generators $G_n$, $G_r$ and two discriminators $D_n$, $D_r$ to form rain-to-rain-free and rain-free-to-rain-free cycles, training with losses $\mathcal{L}_{adv}$, $\mathcal{L}_{LCL}$, $\mathcal{L}_{intra}$, and $\mathcal{L}_{inter}$, achieving state-of-the-art performance on RainCityscapes and SPA without paired ground-truth data. The results demonstrate the effectiveness of combining semantic-aware reconstruction with discriminant-space learning for high-quality rain removal, suggesting broader applicability to other low-level vision tasks such as underwater enhancement, haze removal, and denoising.

Abstract

To tackle the difficulties in fitting paired real-world data for single image deraining (SID), recent unsupervised methods have achieved notable success. However, these methods often struggle to generate high-quality, rain-free images due to a lack of attention to semantic representation and image content, resulting in ineffective separation of content from the rain layer. In this paper, we propose a novel cycle contrastive generative adversarial network for unsupervised SID, called CCLGAN. This framework combines cycle contrastive learning (CCL) and location contrastive learning (LCL). CCL improves image reconstruction and rain-layer removal by bringing similar features closer and pushing dissimilar features apart in both semantic and discriminative spaces. At the same time, LCL preserves content information by constraining mutual information at the same location across different exemplars. CCLGAN shows superior performance, as extensive experiments demonstrate the benefits of CCLGAN and the effectiveness of its components.

Cycle Contrastive Adversarial Learning for Unsupervised image Deraining

TL;DR

This work tackles unsupervised single image deraining by introducing CCLGAN, a cycle contrastive learning framework that combines cycle-contrastive and location-contrastive losses to disentangle rain from content. It leverages a CLIP-based semantic latent space (intra-CCL) and a discriminant latent space derived from discriminator encoders (inter-CCL), guided by adversarial losses and a robust mutual-information-like constraint (LCL) to preserve content. The model uses two generators , and two discriminators , to form rain-to-rain-free and rain-free-to-rain-free cycles, training with losses , , , and , achieving state-of-the-art performance on RainCityscapes and SPA without paired ground-truth data. The results demonstrate the effectiveness of combining semantic-aware reconstruction with discriminant-space learning for high-quality rain removal, suggesting broader applicability to other low-level vision tasks such as underwater enhancement, haze removal, and denoising.

Abstract

To tackle the difficulties in fitting paired real-world data for single image deraining (SID), recent unsupervised methods have achieved notable success. However, these methods often struggle to generate high-quality, rain-free images due to a lack of attention to semantic representation and image content, resulting in ineffective separation of content from the rain layer. In this paper, we propose a novel cycle contrastive generative adversarial network for unsupervised SID, called CCLGAN. This framework combines cycle contrastive learning (CCL) and location contrastive learning (LCL). CCL improves image reconstruction and rain-layer removal by bringing similar features closer and pushing dissimilar features apart in both semantic and discriminative spaces. At the same time, LCL preserves content information by constraining mutual information at the same location across different exemplars. CCLGAN shows superior performance, as extensive experiments demonstrate the benefits of CCLGAN and the effectiveness of its components.
Paper Structure (14 sections, 11 equations, 3 figures, 4 tables)

This paper contains 14 sections, 11 equations, 3 figures, 4 tables.

Figures (3)

  • Figure 1: Overall framework of CCLGAN. It mainly consists of cycle contrastive learning (CCL) and location contrastive learning (LCL). CCL contains two cooperative branches: one is the intra-CCL Branch and the other is the inter-CCL Branch. Intra-CCL Branch aims to construct a semantic latent space using Contrastive Language-Image Pre-Training (CLIP). In the semantic latent space we constructed (the orange box), we pull the reconstructed rain-free image $n^\text{*}$ and its corresponding real rain-free image $n$ close while pushing $n^\text{*}$ away from the rainy images( $\widetilde{r}$ and $r$), where the generated rainy image $\widetilde{r}$ is fake negetive, and the real rainy image $r$ is real negative. The fake negative $\widetilde{r}$ is similar to the query $n^\text{*}$, making it easy to notice the rain layer for our network in the semantic latent space. The real negative $r$ can make our network learn the representation of the real rain layer. Similarly, inter-CCL is proposed to realize stripping of the rain layer in a discriminative latent space. In the discriminant latent space we constructed (the yellow box), pulling the reconstructed rain-free image $n^\text{*}$ , the generated rain-free image $\widetilde{n}$ and the real rain-free image $n$ close and pushing them away from the real rainy images $r$ in discriminant latent space, where $n$ is real positive and $\widetilde{n}$ is fake positive. Pulling $n^\text{*}$ and $\widetilde{n}$ (fake positive) closer aims to converge the similar feature distributions. LCL aims to improve the similarity of content embeddings and enhance the location information.
  • Figure 2: Visual comparison on RainCityscapes. In contrast, our methods achieve more natural results.
  • Figure 3: Visual comparison on SPA. In contrast, our methods achieve better results.