Table of Contents
Fetching ...

Unified Removal of Raindrops and Reflections: A New Benchmark and A Novel Pipeline

Xingyu Liu, Zewei He, Yu Chen, Chunyu Zhu, Zixuan Chen, Xing Luo, Zhe-Ming Lu

Abstract

When capturing images through glass surfaces or windshields on rainy days, raindrops and reflections frequently co-occur to significantly reduce the visibility of captured images. This practical problem lacks attention and needs to be resolved urgently. Prior de-raindrop, de-reflection, and all-in-one models have failed to address this composite degradation. To this end, we first formally define the unified removal of raindrops and reflections (UR$^3$) task for the first time and construct a real-shot dataset, namely RainDrop and ReFlection (RDRF), which provides a new benchmark with substantial, high-quality, diverse image pairs. Then, we propose a novel diffusion-based framework (i.e., DiffUR$^3$) with several target designs to address this challenging task. By leveraging the powerful generative prior, DiffUR$^3$ successfully removes both types of degradations. Extensive experiments demonstrate that our method achieves state-of-the-art performance on our benchmark and on challenging in-the-wild images. The RDRF dataset and the codes will be made public upon acceptance.

Unified Removal of Raindrops and Reflections: A New Benchmark and A Novel Pipeline

Abstract

When capturing images through glass surfaces or windshields on rainy days, raindrops and reflections frequently co-occur to significantly reduce the visibility of captured images. This practical problem lacks attention and needs to be resolved urgently. Prior de-raindrop, de-reflection, and all-in-one models have failed to address this composite degradation. To this end, we first formally define the unified removal of raindrops and reflections (UR) task for the first time and construct a real-shot dataset, namely RainDrop and ReFlection (RDRF), which provides a new benchmark with substantial, high-quality, diverse image pairs. Then, we propose a novel diffusion-based framework (i.e., DiffUR) with several target designs to address this challenging task. By leveraging the powerful generative prior, DiffUR successfully removes both types of degradations. Extensive experiments demonstrate that our method achieves state-of-the-art performance on our benchmark and on challenging in-the-wild images. The RDRF dataset and the codes will be made public upon acceptance.
Paper Structure (31 sections, 5 equations, 22 figures, 6 tables)

This paper contains 31 sections, 5 equations, 22 figures, 6 tables.

Figures (22)

  • Figure 1: We compare our DiffUR$^3$ pipeline with other methods on low-quality images with raindrops and reflections from our newly collected real-world benchmark. Specifically, (c) DAI Hu2026AAAI-DAI is designed for reflection removal, (d) A cascaded method, and (e) A re-trained all-in-one method (i.e., Histoformer Sun2024ECCV-Hist and $\dagger$ indicates re-trained on our dataset), (f) Our DiffUR$^3$ pipeline jointly removes both degradations in a single pass.
  • Figure 2: (a) Sketch diagram and actual equipment of our image acquisition platform. To suppress shutter-induced micro-vibrations which may potentially induce image misalignment, we implement a wireless triggering mechanism. It comprises a remote controller and a camera-mounted signal receiver, enabling contact-free shutter operation. (b) The data collection pipeline for our RDRF dataset. denotes light occlusion
  • Figure 3: Our RDRF dataset comprises a diverse collection of scenes, each contains a ground truth and multiple low-quality images. As illustrated in this figure, the clean ground truths are highlighted in red boxes, while corresponding low-quality images are arranged around. We divide it into the training and testing subsets, ensuring no overlapping samples between them. Please zoom in on screen for a better view.
  • Figure 4: (a) Overall pipeline of our DiffUR$^3$ framework. Given a low-quality image $I_{lq}$, the restoration stage removes the undesired degradation to obtain the initial result $I_{s}$. Both $I_{lq}$ and $I_{s}$ are fed into the next stage as the condition images. We inject the effective condition information through a control branch, which outputs control signals for the noise prediction U-Net. (b) Details of the Modulate&Gate module within the control branch. (c) The generation of noisy latent $z_{t}$ during the training phase. Note that the noisy latent starts from random Gaussian noise during the inference.
  • Figure 5: The motivation of employing an additional fidelity encoder.
  • ...and 17 more figures