Table of Contents
Fetching ...

You Only Erase Once: Erasing Anything without Bringing Unexpected Content

Yixing Zhu, Qing Zhang, Wenju Xu, Wei-Shi Zheng

Abstract

We present YOEO, an approach for object erasure. Unlike recent diffusion-based methods which struggle to erase target objects without generating unexpected content within the masked regions due to lack of sufficient paired training data and explicit constraint on content generation, our method allows to produce high-quality object erasure results free of unwanted objects or artifacts while faithfully preserving the overall context coherence to the surrounding content. We achieve this goal by training an object erasure diffusion model on unpaired data containing only large-scale real-world images, under the supervision of a sundries detector and a context coherence loss that are built upon an entity segmentation model. To enable more efficient training and inference, a diffusion distillation strategy is employed to train for a few-step erasure diffusion model. Extensive experiments show that our method outperforms the state-of-the-art object erasure methods. Code will be available at https://zyxunh.github.io/YOEO-ProjectPage/.

You Only Erase Once: Erasing Anything without Bringing Unexpected Content

Abstract

We present YOEO, an approach for object erasure. Unlike recent diffusion-based methods which struggle to erase target objects without generating unexpected content within the masked regions due to lack of sufficient paired training data and explicit constraint on content generation, our method allows to produce high-quality object erasure results free of unwanted objects or artifacts while faithfully preserving the overall context coherence to the surrounding content. We achieve this goal by training an object erasure diffusion model on unpaired data containing only large-scale real-world images, under the supervision of a sundries detector and a context coherence loss that are built upon an entity segmentation model. To enable more efficient training and inference, a diffusion distillation strategy is employed to train for a few-step erasure diffusion model. Extensive experiments show that our method outperforms the state-of-the-art object erasure methods. Code will be available at https://zyxunh.github.io/YOEO-ProjectPage/.

Paper Structure

This paper contains 13 sections, 10 equations, 8 figures, 2 tables.

Figures (8)

  • Figure 1: Comparison of object erasure. Unlike the compared methods which tend to generate unwanted objects or artifacts within the masked regions, our method allows to cleanly erase the target object in a single pass, without introducing any unexpected content while maintaining the overall harmony and context consistency.
  • Figure 2: Overview of framework. It distills the student denoiser with datasets $\mathcal{D}_1$ and $\mathcal{D}_2$. The $\mathcal{D}_1$ is constructed by randomly occluding inpainting masks onto background regions, where the original image can be used as the corresponding ground truth. The $\mathcal{D}_2$ uses the objects in the image as inpainting masks, aiming to erase the selected objects. Erasure Diffusion Distillation is the basic distillation framework, introducing the distillation-related losses $\mathcal{L}_{LPIPS}$, $\mathcal{L}_{DMD}$, and $\mathcal{L}_{GAN}$. Entity-Coherent Erasure employs an entity segmentor to predict the entity segment of the erased image. The sundry entities filtered by Sundries Detection (Fig. \ref{['fig:sundries']}) serve as sundries suppression loss $\mathcal{L}_{SS}$ during training to suppress the generation of unwanted sundries. In addition, we compute the cosine similarity between the segment feature inside the generated region (in mask) and those outside the region (out mask) as entity feature coherent loss $\mathcal{L}_{EFC}$, encouraging the model to generate context-consistent content.
  • Figure 3: Illustration of our sundries detection. To identify unintended objects (sundries) in the erased output, we first perform entity segmentation using a pretrained entity segmentation model. For each detected entity, we compute its IoS with respect to the inpainting mask. Entities whose IoS exceeds a threshold $\lambda$ are classified as newly generated sundries.
  • Figure 4: Qualitative comparison with state-of-the-art methods on the COCO dataset.
  • Figure 5: Qualitative ablation study of our method. EFC and SS refer to the entity feature coherence loss and sundries Suppression loss, respectively. The baseline indicates the student model distilled with $\mathcal{D}_2$, but without using the EFC or SS losses.
  • ...and 3 more figures