Table of Contents
Fetching ...

Erase Diffusion: Empowering Object Removal Through Calibrating Diffusion Pathways

Yi Liu, Hao Zhou, Wenxiang Shang, Ran Lin, Benlei Cui

TL;DR

EraDiff tackles the persistent challenge of object removal with diffusion-based erase inpainting by calibrating diffusion pathways to directly erase objects and restore backgrounds. It introduces Chain-Rectifying Optimization (CRO) to create dedicated erase diffusion chains and Self-Rectifying Attention (SRA) to suppress artifact-driven signals during sampling. Through dynamic latent-state synthesis and a novel optimization objective, EraDiff achieves improved object elimination while preserving image coherence, with state-of-the-art results on OpenImages V5 and strong generalization to real-world scenarios. The work offers practical gains for automated erasing tasks in media editing and content-aware image processing.

Abstract

Erase inpainting, or object removal, aims to precisely remove target objects within masked regions while preserving the overall consistency of the surrounding content. Despite diffusion-based methods have made significant strides in the field of image inpainting, challenges remain regarding the emergence of unexpected objects or artifacts. We assert that the inexact diffusion pathways established by existing standard optimization paradigms constrain the efficacy of object removal. To tackle these challenges, we propose a novel Erase Diffusion, termed EraDiff, aimed at unleashing the potential power of standard diffusion in the context of object removal. In contrast to standard diffusion, the EraDiff adapts both the optimization paradigm and the network to improve the coherence and elimination of the erasure results. We first introduce a Chain-Rectifying Optimization (CRO) paradigm, a sophisticated diffusion process specifically designed to align with the objectives of erasure. This paradigm establishes innovative diffusion transition pathways that simulate the gradual elimination of objects during optimization, allowing the model to accurately capture the intent of object removal. Furthermore, to mitigate deviations caused by artifacts during the sampling pathways, we develop a simple yet effective Self-Rectifying Attention (SRA) mechanism. The SRA calibrates the sampling pathways by altering self-attention activation, allowing the model to effectively bypass artifacts while further enhancing the coherence of the generated content. With this design, our proposed EraDiff achieves state-of-the-art performance on the OpenImages V5 dataset and demonstrates significant superiority in real-world scenarios.

Erase Diffusion: Empowering Object Removal Through Calibrating Diffusion Pathways

TL;DR

EraDiff tackles the persistent challenge of object removal with diffusion-based erase inpainting by calibrating diffusion pathways to directly erase objects and restore backgrounds. It introduces Chain-Rectifying Optimization (CRO) to create dedicated erase diffusion chains and Self-Rectifying Attention (SRA) to suppress artifact-driven signals during sampling. Through dynamic latent-state synthesis and a novel optimization objective, EraDiff achieves improved object elimination while preserving image coherence, with state-of-the-art results on OpenImages V5 and strong generalization to real-world scenarios. The work offers practical gains for automated erasing tasks in media editing and content-aware image processing.

Abstract

Erase inpainting, or object removal, aims to precisely remove target objects within masked regions while preserving the overall consistency of the surrounding content. Despite diffusion-based methods have made significant strides in the field of image inpainting, challenges remain regarding the emergence of unexpected objects or artifacts. We assert that the inexact diffusion pathways established by existing standard optimization paradigms constrain the efficacy of object removal. To tackle these challenges, we propose a novel Erase Diffusion, termed EraDiff, aimed at unleashing the potential power of standard diffusion in the context of object removal. In contrast to standard diffusion, the EraDiff adapts both the optimization paradigm and the network to improve the coherence and elimination of the erasure results. We first introduce a Chain-Rectifying Optimization (CRO) paradigm, a sophisticated diffusion process specifically designed to align with the objectives of erasure. This paradigm establishes innovative diffusion transition pathways that simulate the gradual elimination of objects during optimization, allowing the model to accurately capture the intent of object removal. Furthermore, to mitigate deviations caused by artifacts during the sampling pathways, we develop a simple yet effective Self-Rectifying Attention (SRA) mechanism. The SRA calibrates the sampling pathways by altering self-attention activation, allowing the model to effectively bypass artifacts while further enhancing the coherence of the generated content. With this design, our proposed EraDiff achieves state-of-the-art performance on the OpenImages V5 dataset and demonstrates significant superiority in real-world scenarios.

Paper Structure

This paper contains 20 sections, 22 equations, 13 figures, 8 tables, 1 algorithm.

Figures (13)

  • Figure 1: The overview of our proposed Erase Diffusion, termed EraDiff. Left: Dynamic image synthesis. Each image is initially transformed using techniques like matting, scaling, and copy-pasting. A mix-up strategy then synthesizes a series of dynamic images {$\bm{\tilde{x}}_{t}^{mix}$} that simulate the gradual fading of the object. Top: Chain-Rectifying Optimization (CRO). The standard sampling pathway is prone to generating artifacts (black dashed lines). In contrast, we establish a new sampling path for erasing (red dashed lines) that better aligns the reverse sampling trajectory with a clear background. Bottom: Self-Rectifying Attention (SRA). The standard self-attention mechanism may inadvertently amplify artifacts, diverging from the expected diffusion pathway. By modifying the attention activation, we guide the model to bypass artifact regions, enhancing its focus on the background and ensuring a more accurate erase sampling path.
  • Figure 2: Qualitative results of OpenImages V5 dataset compared among SD2-Inpaint LDMs, SD2-Inpaint with prompt guidance LDMs, PowerPaint powerpaint, Inst-Inpaint InstInpaint, LaMa LaMa, and our approach.
  • Figure 3: Results from the user study. EraDiff demonstrates enhanced performance, as indicated by its higher mean scores in both elimination and coherence evaluations.
  • Figure 4: Visualization of EraDiff's performance across a diverse array of in-the-wild scenarios: animated imagery, e-commerce content, oil paintings, and glasses-free 3D visuals.
  • Figure 5: Visual examples for the ablation study comparing baseline, baseline with CRO, baseline with SRA, and baseline with both CRO and SRA, displayed left to right.
  • ...and 8 more figures