Table of Contents
Fetching ...

Outline-Guided Object Inpainting with Diffusion Models

Markus Pobitzer, Filip Janicki, Mattia Rigotti, Cristiano Malossi

TL;DR

This work generates new images using a diffusion-based inpainting model to fill out the masked area with a desired object class by guiding the diffusion through the object outline, and creates variations of the available annotated object instances in a way that preserves the provided mask annotations.

Abstract

Instance segmentation datasets play a crucial role in training accurate and robust computer vision models. However, obtaining accurate mask annotations to produce high-quality segmentation datasets is a costly and labor-intensive process. In this work, we show how this issue can be mitigated by starting with small annotated instance segmentation datasets and augmenting them to effectively obtain a sizeable annotated dataset. We achieve that by creating variations of the available annotated object instances in a way that preserves the provided mask annotations, thereby resulting in new image-mask pairs to be added to the set of annotated images. Specifically, we generate new images using a diffusion-based inpainting model to fill out the masked area with a desired object class by guiding the diffusion through the object outline. We show that the object outline provides a simple, but also reliable and convenient training-free guidance signal for the underlying inpainting model that is often sufficient to fill out the mask with an object of the correct class without further text guidance and preserve the correspondence between generated images and the mask annotations with high precision. Our experimental results reveal that our method successfully generates realistic variations of object instances, preserving their shape characteristics while introducing diversity within the augmented area. We also show that the proposed method can naturally be combined with text guidance and other image augmentation techniques.

Outline-Guided Object Inpainting with Diffusion Models

TL;DR

This work generates new images using a diffusion-based inpainting model to fill out the masked area with a desired object class by guiding the diffusion through the object outline, and creates variations of the available annotated object instances in a way that preserves the provided mask annotations.

Abstract

Instance segmentation datasets play a crucial role in training accurate and robust computer vision models. However, obtaining accurate mask annotations to produce high-quality segmentation datasets is a costly and labor-intensive process. In this work, we show how this issue can be mitigated by starting with small annotated instance segmentation datasets and augmenting them to effectively obtain a sizeable annotated dataset. We achieve that by creating variations of the available annotated object instances in a way that preserves the provided mask annotations, thereby resulting in new image-mask pairs to be added to the set of annotated images. Specifically, we generate new images using a diffusion-based inpainting model to fill out the masked area with a desired object class by guiding the diffusion through the object outline. We show that the object outline provides a simple, but also reliable and convenient training-free guidance signal for the underlying inpainting model that is often sufficient to fill out the mask with an object of the correct class without further text guidance and preserve the correspondence between generated images and the mask annotations with high precision. Our experimental results reveal that our method successfully generates realistic variations of object instances, preserving their shape characteristics while introducing diversity within the augmented area. We also show that the proposed method can naturally be combined with text guidance and other image augmentation techniques.
Paper Structure (12 sections, 6 figures, 4 tables)

This paper contains 12 sections, 6 figures, 4 tables.

Figures (6)

  • Figure 1: Overview of our proposed method. Given an image, a mask for an object, and the associated class, we create various variations of the object through inpainting. For this, we erode the original mask to give the inpainting model guidance through the outline of the object. Through this, it creates a variation of the original object and is less likely to inpaint the background. Additionally, we can provide text guidance, such as the associated class. The method is not limited to any specific type of object and works well in challenging scenes. It can easily be combined with existing image augmentations.
  • Figure 2: Inpaintings without and with text guidance where only the outline of the object is provided. The first column shows the original images, and the second column is the actual input, where the masked area to inpaint is highlighted in green. The background was replaced with noise to remove possible object connections with the scene and focus only on the outline. The last two columns show inpainting results without a prompt and with the object class as the prompt.
  • Figure 3: Here we show an example A) that failed when we tried to remove the cars with the original COCO masks depicted in B). The text prompt "photograph of a beautiful empty scene, highest quality settings" was used and follows rombach2022high for background inpainting. The negative prompt contained "car". In the inpaintings C), D), and E) we see that a car has been inpainted on the right of the horse. We argue this is the case since the mask does not completely cover the car and the model uses the outline of the object as guidance.
  • Figure 4: Here we show the impact of the erosion size. We start with the original mask (erosion 0) and then gradually increase the erosion kernel in steps of 6 pixels. The top row shows the image with the applied eroded mask. The bottom row shows the corresponding generated outputs. If the erosion size is too small the inpainting model does not completely follow the outline of the airplane.
  • Figure 5: Comparison of different prompts with different erosion sizes on an image with class label "bus". In the second row, we have the background prompt: "photograph of a beautiful empty scene, highest quality settings" but the output still resembles a bus. This illustrates the text misalignment issue mentioned in xie2023smartbrush but in a more general sense. The third row is generated with no prompt and the model still coherently inpaints a bus. This is also the case for the original mask since it did not cover the whole bus. Row 4 has our standard prompt and row 5 adds the negative prompt.
  • ...and 1 more figures