FIRM: Flexible Interactive Reflection reMoval
Xiao Chen, Xudong Jiang, Yunkang Tao, Zhen Lei, Qing Li, Chenyang Lei, Zhaoxiang Zhang
TL;DR
This work tackles the ill-posed problem of single-image reflection removal by introducing FIRM, a flexible interactive framework that accepts diverse user guidance forms and converts them into contrastive masks via a dedicated UGC module. A Segmentation Any Reflection Model (SARM) enables visual and text prompts to produce accurate reflection/transmission masks, which are then fused with the blended image through a Contrastive Guidance Interaction Block (CGIB) built on cross-attention to achieve precise layer separation with a lightweight CNN backbone. The method delivers state-of-the-art results on Real20 and SIR2 while drastically reducing annotation time (from tens of seconds to a few seconds per image) and supports a new interactive reflection removal dataset with four guidance modalities. Overall, FIRM enhances practicality and performance for real-world reflection removal by unifying guidance forms and enabling efficient, accurate segmentation-guided decomposition.
Abstract
Removing reflection from a single image is challenging due to the absence of general reflection priors. Although existing methods incorporate extensive user guidance for satisfactory performance, they often lack the flexibility to adapt user guidance in different modalities, and dense user interactions further limit their practicality. To alleviate these problems, this paper presents FIRM, a novel framework for Flexible Interactive image Reflection reMoval with various forms of guidance, where users can provide sparse visual guidance (e.g., points, boxes, or strokes) or text descriptions for better reflection removal. Firstly, we design a novel user guidance conversion module (UGC) to transform different forms of guidance into unified contrastive masks. The contrastive masks provide explicit cues for identifying reflection and transmission layers in blended images. Secondly, we devise a contrastive mask-guided reflection removal network that comprises a newly proposed contrastive guidance interaction block (CGIB). This block leverages a unique cross-attention mechanism that merges contrastive masks with image features, allowing for precise layer separation. The proposed framework requires only 10\% of the guidance time needed by previous interactive methods, which makes a step-change in flexibility. Extensive results on public real-world reflection removal datasets validate that our method demonstrates state-of-the-art reflection removal performance. Code is avaliable at https://github.com/ShawnChenn/FlexibleReflectionRemoval.
