Table of Contents
Fetching ...

GFRRN: Explore the Gaps in Single Image Reflection Removal

Yu Chen, Zewei He, Xingyu Liu, Zixuan Chen, Zheming Lu

TL;DR

This work first adopt the parameter efficient fine-tuning (PEFT) strategy by integrating several learnable Mona layers into the pre-trained model to align the training directions, and then a label generator is designed to unify the reflection labels for both synthetic and real-world data.

Abstract

Prior dual-stream methods with the feature interaction mechanism have achieved remarkable performance in single image reflection removal (SIRR). However, they often struggle with (1) semantic understanding gap between the features of pre-trained models and those of reflection removal models, and (2) reflection label inconsistencies between synthetic and real-world training data. In this work, we first adopt the parameter efficient fine-tuning (PEFT) strategy by integrating several learnable Mona layers into the pre-trained model to align the training directions. Then, a label generator is designed to unify the reflection labels for both synthetic and real-world data. In addition, a Gaussian-based Adaptive Frequency Learning Block (G-AFLB) is proposed to adaptively learn and fuse the frequency priors, and a Dynamic Agent Attention (DAA) is employed as an alternative to window-based attention by dynamically modeling the significance levels across windows (inter-) and within an individual window (intra-). These components constitute our proposed Gap-Free Reflection Removal Network (GFRRN). Extensive experiments demonstrate the effectiveness of our GFRRN, achieving superior performance against state-of-the-art SIRR methods.

GFRRN: Explore the Gaps in Single Image Reflection Removal

TL;DR

This work first adopt the parameter efficient fine-tuning (PEFT) strategy by integrating several learnable Mona layers into the pre-trained model to align the training directions, and then a label generator is designed to unify the reflection labels for both synthetic and real-world data.

Abstract

Prior dual-stream methods with the feature interaction mechanism have achieved remarkable performance in single image reflection removal (SIRR). However, they often struggle with (1) semantic understanding gap between the features of pre-trained models and those of reflection removal models, and (2) reflection label inconsistencies between synthetic and real-world training data. In this work, we first adopt the parameter efficient fine-tuning (PEFT) strategy by integrating several learnable Mona layers into the pre-trained model to align the training directions. Then, a label generator is designed to unify the reflection labels for both synthetic and real-world data. In addition, a Gaussian-based Adaptive Frequency Learning Block (G-AFLB) is proposed to adaptively learn and fuse the frequency priors, and a Dynamic Agent Attention (DAA) is employed as an alternative to window-based attention by dynamically modeling the significance levels across windows (inter-) and within an individual window (intra-). These components constitute our proposed Gap-Free Reflection Removal Network (GFRRN). Extensive experiments demonstrate the effectiveness of our GFRRN, achieving superior performance against state-of-the-art SIRR methods.
Paper Structure (20 sections, 19 equations, 14 figures, 6 tables, 3 algorithms)

This paper contains 20 sections, 19 equations, 14 figures, 6 tables, 3 algorithms.

Figures (14)

  • Figure 1: Setting (a): Single-stream method like RRW Zhu2024CVPR-RRW; Setting (b): Dual-stream method like IBCLN Li2020CVPR-IBCLN; Setting (c): Dual-stream method with feature interaction mechanism like DSIT Hu2024NeurIPS-DSIT; Setting (d): Ours; (e) Experimental results with different settings.
  • Figure 2: The overall architecture of our GFRRN. It consists of two parallel encoders (i.e., a pre-trained Swin-Transformer with some learnable Mona layers as the Encoder 1, and a dual-stream CNN borrowed from DSIT Hu2024NeurIPS-DSIT as the Encoder 2) and a single decoder.
  • Figure 3: (a) Semantic gap exists between the pre-trained model and the reflection removal model. (b) A cognitive-inspired Mona-tuning technique is proposed to bridge the semantic gap.
  • Figure 4: (a) A degraded image $\mathbf{I}$ from Real dataset. (b) The label of corresponding transmission layer. (c) $\mathbf{I} - \mathbf{T}$ for supervising the reflection layer. (d) Our unified label. $(\mathbf{I} - \mathbf{T})_{\text{low}}$ denotes the low-frequency part.
  • Figure 5: Details of our proposed dynamic agent attention (DAA).
  • ...and 9 more figures