Table of Contents
Fetching ...

ObjectClear: Complete Object Removal via Object-Effect Attention

Jixin Zhao, Shangchen Zhou, Zhouxia Wang, Peiqing Yang, Chen Change Loy

TL;DR

ObjectClear addresses the challenge of removing an object together with its visual effects by introducing the OBER dataset and a dedicated Object-Effect Attention mechanism. By supervising cross-attention with precise object-effect masks and employing an Attention-Guided Fusion strategy, the approach decouples foreground removal from background reconstruction and preserves background details. The hybrid OBER dataset (camera-captured plus synthetic data) enables robust training, including multi-object occlusions and reflections, while the method achieves state-of-the-art results on multiple benchmarks and demonstrates practical extensions to object insertion and movement. This work advances controllable image editing by explicitly modeling object effects and guiding precise region-aware fusion, offering a scalable path for real-world applications.

Abstract

Object removal requires eliminating not only the target object but also its effects, such as shadows and reflections. However, diffusion-based inpainting methods often produce artifacts, hallucinate content, alter background, and struggle to remove object effects accurately. To address this challenge, we introduce a new dataset for OBject-Effect Removal, named OBER, which provides paired images with and without object effects, along with precise masks for both objects and their associated visual artifacts. The dataset comprises high-quality captured and simulated data, covering diverse object categories and complex multi-object scenes. Building on OBER, we propose a novel framework, ObjectClear, which incorporates an object-effect attention mechanism to guide the model toward the foreground removal regions by learning attention masks, effectively decoupling foreground removal from background reconstruction. Furthermore, the predicted attention map enables an attention-guided fusion strategy during inference, greatly preserving background details. Extensive experiments demonstrate that ObjectClear outperforms existing methods, achieving improved object-effect removal quality and background fidelity, especially in complex scenarios.

ObjectClear: Complete Object Removal via Object-Effect Attention

TL;DR

ObjectClear addresses the challenge of removing an object together with its visual effects by introducing the OBER dataset and a dedicated Object-Effect Attention mechanism. By supervising cross-attention with precise object-effect masks and employing an Attention-Guided Fusion strategy, the approach decouples foreground removal from background reconstruction and preserves background details. The hybrid OBER dataset (camera-captured plus synthetic data) enables robust training, including multi-object occlusions and reflections, while the method achieves state-of-the-art results on multiple benchmarks and demonstrates practical extensions to object insertion and movement. This work advances controllable image editing by explicitly modeling object effects and guiding precise region-aware fusion, offering a scalable path for real-world applications.

Abstract

Object removal requires eliminating not only the target object but also its effects, such as shadows and reflections. However, diffusion-based inpainting methods often produce artifacts, hallucinate content, alter background, and struggle to remove object effects accurately. To address this challenge, we introduce a new dataset for OBject-Effect Removal, named OBER, which provides paired images with and without object effects, along with precise masks for both objects and their associated visual artifacts. The dataset comprises high-quality captured and simulated data, covering diverse object categories and complex multi-object scenes. Building on OBER, we propose a novel framework, ObjectClear, which incorporates an object-effect attention mechanism to guide the model toward the foreground removal regions by learning attention masks, effectively decoupling foreground removal from background reconstruction. Furthermore, the predicted attention map enables an attention-guided fusion strategy during inference, greatly preserving background details. Extensive experiments demonstrate that ObjectClear outperforms existing methods, achieving improved object-effect removal quality and background fidelity, especially in complex scenarios.

Paper Structure

This paper contains 20 sections, 2 equations, 15 figures, 5 tables.

Figures (15)

  • Figure 1: Object Removal Comparison. Given an object mask, prior methods often leave residual artifacts or hallucinate undesirable content, change background, and typically fail to remove associated effects such as shadows and reflections. In contrast, our ObjectClear precisely eliminates both the object and its associated effects, achieving seamless object removal results even in challenging cases.
  • Figure 2: Dataset Construction Pipeline of OBER. The dataset combines both camera-captured and simulated data, featuring diverse foreground objects and background scenes. It provides rich annotations, including object masks, object-effect masks, transparent RGBA object layers, and complex multi-object scenarios for training and evaluation.
  • Figure 3: The Framework of ObjectClear. Given an input image and a target object mask, ObjectClear employs an Object-Effect Attention mechanism to guide the model toward foreground removal regions by learning attention masks. The predicted mask further enables an Attention-Guided Fusion strategy during inference, which substantially preserves background details.
  • Figure 4: Object Removal on OBER-Test and RORD-Val. Our ObjectClear effectively removes both the masked objects and their associated effects.
  • Figure 5: Object Removal on OBER-Wild. Our ObjectClear not only effectively removes single objects with their shadows and reflections (top five samples), but also accurately removes the target object when multiple mutually occluding objects exist (bottom two samples).
  • ...and 10 more figures