Table of Contents
Fetching ...

Dereflection Any Image with Diffusion Priors and Diversified Data

Jichen Hu, Chen Yang, Zanwei Zhou, Jiemin Fang, Xiaokang Yang, Qi Tian, Wei Shen

TL;DR

This work tackles the challenging problem of single-image reflection removal by introducing the Diverse Reflection Removal (DRR) dataset and a diffusion-prior framework that uses one-step denoising, ControlNet conditioning, and a cross-latent decoder. A three-stage progressive training strategy, including reflection-invariant finetuning, enhances generalization to diverse real-world reflections. Empirical results show state-of-the-art performance on standard benchmarks and strong generalization to in-the-wild images, with fast inference suitable for practical use. The combination of high-quality data, efficient diffusion guidance, and robust training yields a versatile dereflection system that benefits downstream vision tasks.

Abstract

Reflection removal of a single image remains a highly challenging task due to the complex entanglement between target scenes and unwanted reflections. Despite significant progress, existing methods are hindered by the scarcity of high-quality, diverse data and insufficient restoration priors, resulting in limited generalization across various real-world scenarios. In this paper, we propose Dereflection Any Image, a comprehensive solution with an efficient data preparation pipeline and a generalizable model for robust reflection removal. First, we introduce a dataset named Diverse Reflection Removal (DRR) created by randomly rotating reflective mediums in target scenes, enabling variation of reflection angles and intensities, and setting a new benchmark in scale, quality, and diversity. Second, we propose a diffusion-based framework with one-step diffusion for deterministic outputs and fast inference. To ensure stable learning, we design a three-stage progressive training strategy, including reflection-invariant finetuning to encourage consistent outputs across varying reflection patterns that characterize our dataset. Extensive experiments show that our method achieves SOTA performance on both common benchmarks and challenging in-the-wild images, showing superior generalization across diverse real-world scenes.

Dereflection Any Image with Diffusion Priors and Diversified Data

TL;DR

This work tackles the challenging problem of single-image reflection removal by introducing the Diverse Reflection Removal (DRR) dataset and a diffusion-prior framework that uses one-step denoising, ControlNet conditioning, and a cross-latent decoder. A three-stage progressive training strategy, including reflection-invariant finetuning, enhances generalization to diverse real-world reflections. Empirical results show state-of-the-art performance on standard benchmarks and strong generalization to in-the-wild images, with fast inference suitable for practical use. The combination of high-quality data, efficient diffusion guidance, and robust training yields a versatile dereflection system that benefits downstream vision tasks.

Abstract

Reflection removal of a single image remains a highly challenging task due to the complex entanglement between target scenes and unwanted reflections. Despite significant progress, existing methods are hindered by the scarcity of high-quality, diverse data and insufficient restoration priors, resulting in limited generalization across various real-world scenarios. In this paper, we propose Dereflection Any Image, a comprehensive solution with an efficient data preparation pipeline and a generalizable model for robust reflection removal. First, we introduce a dataset named Diverse Reflection Removal (DRR) created by randomly rotating reflective mediums in target scenes, enabling variation of reflection angles and intensities, and setting a new benchmark in scale, quality, and diversity. Second, we propose a diffusion-based framework with one-step diffusion for deterministic outputs and fast inference. To ensure stable learning, we design a three-stage progressive training strategy, including reflection-invariant finetuning to encourage consistent outputs across varying reflection patterns that characterize our dataset. Extensive experiments show that our method achieves SOTA performance on both common benchmarks and challenging in-the-wild images, showing superior generalization across diverse real-world scenes.

Paper Structure

This paper contains 26 sections, 12 equations, 15 figures, 4 tables.

Figures (15)

  • Figure 1: Our model demonstrates strong and general reflection removal capabilities. Upper: Original images with reflections. Bottom: Results generated by our model. The scenarios include glass, plastic, water surfaces, etc.
  • Figure 2: Our dataset contains a diverse collection of scenes, each accompanied by multiple reflection images. As illustrated in the figure, the ground truth transmission layer is highlighted in red boxes, while the remaining images represent various mixed images. The dataset demonstrates remarkable diversity, encompassing indoor, outdoor, and object-centric scenes. All image pairs maintain high resolution with rich textual details. (Best viewed on screen.)
  • Figure 3: Data collection pipeline of real (above) and synthetic (below) data. Real data is captured by recording videos while rotating a glass panel at various angles, then processed to align mixed images with their ground truth transmission layers. Synthetic data is generated by randomly chosen coefficients and filtered to produce high-quality image pairs.
  • Figure 4: Our proposed framework. It consists of a U-net with one-step denoising strategy, a ControlNet to input the mixed image processed by the encoder $\mathcal{E}$, and a cross-latent decoder $\mathcal{D}$ to mitigate blurriness and preserve details.
  • Figure 5: The three stages of progressive training. First, we train the ControlNet and the upsampling blocks of the U-Net using the basic one-step diffusion loss. Second, we finetune these components by incorporating the consistent loss. Finally, we train the cross-latent decoder using the image reconstruction loss.
  • ...and 10 more figures