ObjectClear: Complete Object Removal via Object-Effect Attention

Jixin Zhao; Shangchen Zhou; Zhouxia Wang; Peiqing Yang; Chen Change Loy

ObjectClear: Complete Object Removal via Object-Effect Attention

Jixin Zhao, Shangchen Zhou, Zhouxia Wang, Peiqing Yang, Chen Change Loy

TL;DR

ObjectClear addresses the challenge of removing an object together with its visual effects by introducing the OBER dataset and a dedicated Object-Effect Attention mechanism. By supervising cross-attention with precise object-effect masks and employing an Attention-Guided Fusion strategy, the approach decouples foreground removal from background reconstruction and preserves background details. The hybrid OBER dataset (camera-captured plus synthetic data) enables robust training, including multi-object occlusions and reflections, while the method achieves state-of-the-art results on multiple benchmarks and demonstrates practical extensions to object insertion and movement. This work advances controllable image editing by explicitly modeling object effects and guiding precise region-aware fusion, offering a scalable path for real-world applications.

Abstract

Object removal requires eliminating not only the target object but also its effects, such as shadows and reflections. However, diffusion-based inpainting methods often produce artifacts, hallucinate content, alter background, and struggle to remove object effects accurately. To address this challenge, we introduce a new dataset for OBject-Effect Removal, named OBER, which provides paired images with and without object effects, along with precise masks for both objects and their associated visual artifacts. The dataset comprises high-quality captured and simulated data, covering diverse object categories and complex multi-object scenes. Building on OBER, we propose a novel framework, ObjectClear, which incorporates an object-effect attention mechanism to guide the model toward the foreground removal regions by learning attention masks, effectively decoupling foreground removal from background reconstruction. Furthermore, the predicted attention map enables an attention-guided fusion strategy during inference, greatly preserving background details. Extensive experiments demonstrate that ObjectClear outperforms existing methods, achieving improved object-effect removal quality and background fidelity, especially in complex scenarios.

ObjectClear: Complete Object Removal via Object-Effect Attention

TL;DR

Abstract

ObjectClear: Complete Object Removal via Object-Effect Attention

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (15)