Table of Contents
Fetching ...

Adaptive Language-Aware Image Reflection Removal Network

Siyan Fang, Yuntao Wang, Jinpu Zhang, Ziwen Li, Yuehuan Wang

TL;DR

The Adaptive Language-Aware Network (ALANet) is proposed to remove reflections even with inaccurate language inputs, and demonstrates that ALANet surpasses state-of-the-art methods for image reflection removal.

Abstract

Existing image reflection removal methods struggle to handle complex reflections. Accurate language descriptions can help the model understand the image content to remove complex reflections. However, due to blurred and distorted interferences in reflected images, machine-generated language descriptions of the image content are often inaccurate, which harms the performance of language-guided reflection removal. To address this, we propose the Adaptive Language-Aware Network (ALANet) to remove reflections even with inaccurate language inputs. Specifically, ALANet integrates both filtering and optimization strategies. The filtering strategy reduces the negative effects of language while preserving its benefits, whereas the optimization strategy enhances the alignment between language and visual features. ALANet also utilizes language cues to decouple specific layer content from feature maps, improving its ability to handle complex reflections. To evaluate the model's performance under complex reflections and varying levels of language accuracy, we introduce the Complex Reflection and Language Accuracy Variance (CRLAV) dataset. Experimental results demonstrate that ALANet surpasses state-of-the-art methods for image reflection removal. The code and dataset are available at https://github.com/fashyon/ALANet.

Adaptive Language-Aware Image Reflection Removal Network

TL;DR

The Adaptive Language-Aware Network (ALANet) is proposed to remove reflections even with inaccurate language inputs, and demonstrates that ALANet surpasses state-of-the-art methods for image reflection removal.

Abstract

Existing image reflection removal methods struggle to handle complex reflections. Accurate language descriptions can help the model understand the image content to remove complex reflections. However, due to blurred and distorted interferences in reflected images, machine-generated language descriptions of the image content are often inaccurate, which harms the performance of language-guided reflection removal. To address this, we propose the Adaptive Language-Aware Network (ALANet) to remove reflections even with inaccurate language inputs. Specifically, ALANet integrates both filtering and optimization strategies. The filtering strategy reduces the negative effects of language while preserving its benefits, whereas the optimization strategy enhances the alignment between language and visual features. ALANet also utilizes language cues to decouple specific layer content from feature maps, improving its ability to handle complex reflections. To evaluate the model's performance under complex reflections and varying levels of language accuracy, we introduce the Complex Reflection and Language Accuracy Variance (CRLAV) dataset. Experimental results demonstrate that ALANet surpasses state-of-the-art methods for image reflection removal. The code and dataset are available at https://github.com/fashyon/ALANet.
Paper Structure (28 sections, 1 equation, 23 figures, 10 tables)

This paper contains 28 sections, 1 equation, 23 figures, 10 tables.

Figures (23)

  • Figure 1: The impact of language-guided reflection removal with different types of language inputs. Inaccurate language inputs result in worse outcomes than having no language. The specific language inputs for each subfigure are provided in the supplementary material.
  • Figure 2: Overview of the proposed ALANet, which comprises various modules that use language adaptively to remove reflections. T and R represent the transmission and reflection layers, respectively.
  • Figure 3: Structure of the LASB. As the core of LASB, LCAM utilizes language features from different layers to facilitate the separation of those layers.
  • Figure 4: Structure of the LCAM. The dashed lines indicate the scenario without language input.
  • Figure 5: Structure of the ALCM. The ALCM enhances the consistency between language features and visual content.
  • ...and 18 more figures