Reflection Removal through Efficient Adaptation of Diffusion Transformers
Daniyar Zakarin, Thiemo Wandel, Anton Obukhov, Dengxin Dai
TL;DR
This paper addresses single-image reflection removal by repurposing a pre-trained diffusion-transformer (DiT) with LoRA adapters for one-step latent-space editing. It introduces a physically based rendering (PBR) data generation pipeline to synthesize realistic glass reflections, paired with a two-stream latent flow-matching approach that yields a fast, high-fidelity transmission reconstruction without multi-step sampling. The method achieves state-of-the-art performance on in-domain and zero-shot benchmarks and demonstrates robust generalization to in-the-wild images, while maintaining training efficiency on a single consumer GPU. The work suggests that diffusion-Transformer priors, when combined with physically grounded data and lightweight adaptation, provide a scalable framework for reflection removal and related computational photography tasks, with potential extensions to video and more complex glass scenarios.
Abstract
We introduce a diffusion-transformer (DiT) framework for single-image reflection removal that leverages the generalization strengths of foundation diffusion models in the restoration setting. Rather than relying on task-specific architectures, we repurpose a pre-trained DiT-based foundation model by conditioning it on reflection-contaminated inputs and guiding it toward clean transmission layers. We systematically analyze existing reflection removal data sources for diversity, scalability, and photorealism. To address the shortage of suitable data, we construct a physically based rendering (PBR) pipeline in Blender, built around the Principled BSDF, to synthesize realistic glass materials and reflection effects. Efficient LoRA-based adaptation of the foundation model, combined with the proposed synthetic data, achieves state-of-the-art performance on in-domain and zero-shot benchmarks. These results demonstrate that pretrained diffusion transformers, when paired with physically grounded data synthesis and efficient adaptation, offer a scalable and high-fidelity solution for reflection removal. Project page: https://hf.co/spaces/huawei-bayerlab/windowseat-reflection-removal-web
