Table of Contents
Fetching ...

MagicEraser: Erasing Any Objects via Semantics-Aware Control

Fan Li, Zixiao Zhang, Yi Huang, Jianzhuang Liu, Renjing Pei, Bin Shao, Songcen Xu

TL;DR

MagicEraser presents a diffusion-model framework for object erasure that avoids reliance on manual, high-quality prompts by combining content initialization with controllable generation. It introduces prompt tuning via Textual Inversion and LoRA, plus a training-free semantics-aware attention refocus that leverages panoptic segmentation to steer background-consistent completion. A novel object-level removal dataset (OLRD) and a data-construction strategy enable effective fine-tuning of Stable Diffusion Inpainting with a focus on erasing objects while harmonizing backgrounds. Across OpenImages, COCO, RealHM, and even commercial tools, MagicEraser achieves state-of-the-art quantitative performance and superior visual coherence, demonstrating practical impact for photo editing and content-aware manipulation.

Abstract

The traditional image inpainting task aims to restore corrupted regions by referencing surrounding background and foreground. However, the object erasure task, which is in increasing demand, aims to erase objects and generate harmonious background. Previous GAN-based inpainting methods struggle with intricate texture generation. Emerging diffusion model-based algorithms, such as Stable Diffusion Inpainting, exhibit the capability to generate novel content, but they often produce incongruent results at the locations of the erased objects and require high-quality text prompt inputs. To address these challenges, we introduce MagicEraser, a diffusion model-based framework tailored for the object erasure task. It consists of two phases: content initialization and controllable generation. In the latter phase, we develop two plug-and-play modules called prompt tuning and semantics-aware attention refocus. Additionally, we propose a data construction strategy that generates training data specially suitable for this task. MagicEraser achieves fine and effective control of content generation while mitigating undesired artifacts. Experimental results highlight a valuable advancement of our approach in the object erasure task.

MagicEraser: Erasing Any Objects via Semantics-Aware Control

TL;DR

MagicEraser presents a diffusion-model framework for object erasure that avoids reliance on manual, high-quality prompts by combining content initialization with controllable generation. It introduces prompt tuning via Textual Inversion and LoRA, plus a training-free semantics-aware attention refocus that leverages panoptic segmentation to steer background-consistent completion. A novel object-level removal dataset (OLRD) and a data-construction strategy enable effective fine-tuning of Stable Diffusion Inpainting with a focus on erasing objects while harmonizing backgrounds. Across OpenImages, COCO, RealHM, and even commercial tools, MagicEraser achieves state-of-the-art quantitative performance and superior visual coherence, demonstrating practical impact for photo editing and content-aware manipulation.

Abstract

The traditional image inpainting task aims to restore corrupted regions by referencing surrounding background and foreground. However, the object erasure task, which is in increasing demand, aims to erase objects and generate harmonious background. Previous GAN-based inpainting methods struggle with intricate texture generation. Emerging diffusion model-based algorithms, such as Stable Diffusion Inpainting, exhibit the capability to generate novel content, but they often produce incongruent results at the locations of the erased objects and require high-quality text prompt inputs. To address these challenges, we introduce MagicEraser, a diffusion model-based framework tailored for the object erasure task. It consists of two phases: content initialization and controllable generation. In the latter phase, we develop two plug-and-play modules called prompt tuning and semantics-aware attention refocus. Additionally, we propose a data construction strategy that generates training data specially suitable for this task. MagicEraser achieves fine and effective control of content generation while mitigating undesired artifacts. Experimental results highlight a valuable advancement of our approach in the object erasure task.

Paper Structure

This paper contains 20 sections, 10 equations, 5 figures, 3 tables.

Figures (5)

  • Figure 1: Comparison with five state-of-the-art inpainting algorithms: MAT li2022mat, Co-Mod zhao2021large, LaMa suvorov2022resolution, CoordFill liu2023coordfill and Stable Diffusion (SD) Inpainting rombach2022high. MagicEraser can effectively erase masked objects and achieve the best texture consistency and content fidelity.
  • Figure 2: MagicEraser, built upon Stable Diffusion Inpainting, comprises two main stages: content initialization and controllable generation. Additionally, we construct an object-level removal dataset (OLRD) specifically designed for the object erasure task.
  • Figure 3: Semantics-aware attention refocus. We combine the panoptic segmentation result of the input image with the input mask to generate $Mask_{pos}$ and $Mask_{neg}$. With the input mask and the panoptic segmentation results, we obtain the labels ($l$) of different regions (white for mask (m) regions, red for positive (p) regions and black for negative (n) regions).
  • Figure 4: Training data comparison between the traditional inpainting and object erasure. (a) Traditional inpainting methods use random mask $m$ and the masked image $\tilde{I}$ to recover the original image $I$. (b) Our model uses the shifted mask $\tilde{m}$ and the blended image $\tilde{I}$ to recover $I$.
  • Figure 5: Visual comparison with five SOTA algorithms.