Rectifying Latent Space for Generative Single-Image Reflection Removal
Mingjia Li, Jin Hu, Hainuo Wang, Qiming Hu, Jiarui Wang, Xiaojie Guo
TL;DR
This work tackles the ill-posed problem of single-image reflection removal by identifying that latent spaces of pretrained encoders do not align with the linear superposition of background and reflection. It introduces GenSIRR, a diffusion-based pipeline with a reflection-equivariant VAE to restructure latent geometry, a Learnable Task Embedding for precise guidance, and a depth-guided early-branching sampling strategy to select high-quality restorations. Through a two-stage training regime and extensive benchmarks, GenSIRR achieves state-of-the-art results and demonstrates strong generalization to challenging real-world images, albeit with higher inference latency. The approach offers a practical path toward reliable, high-fidelity SIRR in the wild and highlights directions for acceleration and broader layer-separation tasks.
Abstract
Single-image reflection removal is a highly ill-posed problem, where existing methods struggle to reason about the composition of corrupted regions, causing them to fail at recovery and generalization in the wild. This work reframes an editing-purpose latent diffusion model to effectively perceive and process highly ambiguous, layered image inputs, yielding high-quality outputs. We argue that the challenge of this conversion stems from a critical yet overlooked issue, i.e., the latent space of semantic encoders lacks the inherent structure to interpret a composite image as a linear superposition of its constituent layers. Our approach is built on three synergistic components, including a reflection-equivariant VAE that aligns the latent space with the linear physics of reflection formation, a learnable task-specific text embedding for precise guidance that bypasses ambiguous language, and a depth-guided early-branching sampling strategy to harness generative stochasticity for promising results. Extensive experiments reveal that our model achieves new SOTA performance on multiple benchmarks and generalizes well to challenging real-world cases.
