Dereflection Any Image with Diffusion Priors and Diversified Data
Jichen Hu, Chen Yang, Zanwei Zhou, Jiemin Fang, Xiaokang Yang, Qi Tian, Wei Shen
TL;DR
This work tackles the challenging problem of single-image reflection removal by introducing the Diverse Reflection Removal (DRR) dataset and a diffusion-prior framework that uses one-step denoising, ControlNet conditioning, and a cross-latent decoder. A three-stage progressive training strategy, including reflection-invariant finetuning, enhances generalization to diverse real-world reflections. Empirical results show state-of-the-art performance on standard benchmarks and strong generalization to in-the-wild images, with fast inference suitable for practical use. The combination of high-quality data, efficient diffusion guidance, and robust training yields a versatile dereflection system that benefits downstream vision tasks.
Abstract
Reflection removal of a single image remains a highly challenging task due to the complex entanglement between target scenes and unwanted reflections. Despite significant progress, existing methods are hindered by the scarcity of high-quality, diverse data and insufficient restoration priors, resulting in limited generalization across various real-world scenarios. In this paper, we propose Dereflection Any Image, a comprehensive solution with an efficient data preparation pipeline and a generalizable model for robust reflection removal. First, we introduce a dataset named Diverse Reflection Removal (DRR) created by randomly rotating reflective mediums in target scenes, enabling variation of reflection angles and intensities, and setting a new benchmark in scale, quality, and diversity. Second, we propose a diffusion-based framework with one-step diffusion for deterministic outputs and fast inference. To ensure stable learning, we design a three-stage progressive training strategy, including reflection-invariant finetuning to encourage consistent outputs across varying reflection patterns that characterize our dataset. Extensive experiments show that our method achieves SOTA performance on both common benchmarks and challenging in-the-wild images, showing superior generalization across diverse real-world scenes.
