OmniEraser: Remove Objects and Their Effects in Images with Paired Video-Frame Data
Runpu Wei, Zijin Yin, Shuo Zhang, Lanxiang Zhou, Xueyi Wang, Chao Ban, Tianwei Cao, Hao Sun, Zhongjiang He, Kongming Liang, Zhanyu Ma
TL;DR
OmniEraser tackles introducing and removing objects along with their visual effects by leveraging a large-scale, real-world paired dataset (Video4Removal) and a novel Object-Background Guidance strategy that conditions diffusion-based restoration on both the target object and its background. The method integrates LoRA adapters with a generative prior to adapt to the removal task, enabling robust suppression of shadows and reflections while preserving surrounding content. Key contributions include the Video4Removal dataset (134,281 triplets at 1920×1080), the Object-Background Guidance framework, and RemovalBench for robust evaluation; OmniEraser achieves state-of-the-art results across wild scenes and generalizes to anime styles. This approach reduces data labor, improves realism, and offers strong practical impact for high-quality image editing and post-processing.
Abstract
Inpainting algorithms have achieved remarkable progress in removing objects from images, yet still face two challenges: 1) struggle to handle the object's visual effects such as shadow and reflection; 2) easily generate shape-like artifacts and unintended content. In this paper, we propose Video4Removal, a large-scale dataset comprising over 100,000 high-quality samples with realistic object shadows and reflections. By constructing object-background pairs from video frames with off-the-shelf vision models, the labor costs of data acquisition can be significantly reduced. To avoid generating shape-like artifacts and unintended content, we propose Object-Background Guidance, an elaborated paradigm that takes both the foreground object and background images. It can guide the diffusion process to harness richer contextual information. Based on the above two designs, we present OmniEraser, a novel method that seamlessly removes objects and their visual effects using only object masks as input. Extensive experiments show that OmniEraser significantly outperforms previous methods, particularly in complex in-the-wild scenes. And it also exhibits a strong generalization ability in anime-style images. Datasets, models, and codes will be published.
