Table of Contents
Fetching ...

Improving Image De-raining Using Reference-Guided Transformers

Zihao Ye, Jaehoon Cho, Changjae Oh

TL;DR

The paper tackles single-image de-raining by introducing a reference-guided de-raining filter (RDF) that augments existing de-raining models with a reference clean image. RDF comprises a feature extractor, a feature attention module, and a feature fusion module to transfer useful features from the reference $R_c$ to the baseline derained output $ hat{X}_c$, guided by cross-scale attention and fusion. A two-stage training strategy uses an $L_1$ loss for initialization and a MS-SSIM-L1 loss for fine-tuning, with $\alpha_1=0.6$ and $\alpha_2=0.4$, enabling robust feature transfer. Experiments on BDD100K-Rain, Cityscapes-Rain, and KITTI-Rain show consistent improvements for GMM, PReNet, and Uformer baselines, demonstrating RDF's plug-and-play applicability and potential to improve real-world outdoor vision systems.

Abstract

Image de-raining is a critical task in computer vision to improve visibility and enhance the robustness of outdoor vision systems. While recent advances in de-raining methods have achieved remarkable performance, the challenge remains to produce high-quality and visually pleasing de-rained results. In this paper, we present a reference-guided de-raining filter, a transformer network that enhances de-raining results using a reference clean image as guidance. We leverage the capabilities of the proposed module to further refine the images de-rained by existing methods. We validate our method on three datasets and show that our module can improve the performance of existing prior-based, CNN-based, and transformer-based approaches.

Improving Image De-raining Using Reference-Guided Transformers

TL;DR

The paper tackles single-image de-raining by introducing a reference-guided de-raining filter (RDF) that augments existing de-raining models with a reference clean image. RDF comprises a feature extractor, a feature attention module, and a feature fusion module to transfer useful features from the reference to the baseline derained output , guided by cross-scale attention and fusion. A two-stage training strategy uses an loss for initialization and a MS-SSIM-L1 loss for fine-tuning, with and , enabling robust feature transfer. Experiments on BDD100K-Rain, Cityscapes-Rain, and KITTI-Rain show consistent improvements for GMM, PReNet, and Uformer baselines, demonstrating RDF's plug-and-play applicability and potential to improve real-world outdoor vision systems.

Abstract

Image de-raining is a critical task in computer vision to improve visibility and enhance the robustness of outdoor vision systems. While recent advances in de-raining methods have achieved remarkable performance, the challenge remains to produce high-quality and visually pleasing de-rained results. In this paper, we present a reference-guided de-raining filter, a transformer network that enhances de-raining results using a reference clean image as guidance. We leverage the capabilities of the proposed module to further refine the images de-rained by existing methods. We validate our method on three datasets and show that our module can improve the performance of existing prior-based, CNN-based, and transformer-based approaches.
Paper Structure (11 sections, 4 equations, 5 figures, 2 tables)

This paper contains 11 sections, 4 equations, 5 figures, 2 tables.

Figures (5)

  • Figure 1: Sample de-rained images from Cityscapes-Rain tremblay2020rain. Unlike existing methods, our reference-guided de-raining filter enhances the de-rained results using a reference clean image as guidance.
  • Figure 2: Overview of our framework. We first obtain a input rainy image, $X_r$, and a synthesized reference rainy image, $R_r$. Using an existing de-raining model, we obtain the input de-rained image, $\hat{X}_c$, and the reference de-rained image, $\hat{R}_c$. These two de-rained images and the reference clean image, $R_c$ are used as input to our reference-guided de-raining filter. By capturing the useful information from the features from $R_c$, and transferring it to $\hat{X}_c$, we can generate the enhanced de-raining output, $\hat{X}_c ^{out}$.
  • Figure 3: Attention maps from the feature attention module. Using the feature of a input de-rained image, $\hat{X}_c$, as a query, and the feature of a reference de-rained image, $\hat{R}_c$, as a key, we compute attention weights that are utilized to select the useful features from the reference image. The attention maps are color-coded, where warmer colors indicate higher values.
  • Figure 4: Feature fusion module. The de-rained images are first projected into the feature space using a shallow feature extractor. The features at each level are then compensated sequentially from level 1 (fine-level) to Level 3 (coarse-level).
  • Figure 5: Effect of reference images on the attention maps and de-raining results. De-raining images are obtained by using (from top to bottom) the ground-truth clean image, Gaussian noise, and our reference image obtained by image retrieval.