Table of Contents
Fetching ...

Structure-Guided Image Completion with Image-level and Object-level Semantic Discriminators

Haitian Zheng, Zhe Lin, Jingwan Lu, Scott Cohen, Eli Shechtman, Connelly Barnes, Jianming Zhang, Qing Liu, Yuqian Zhou, Sohrab Amirghodsi, Jiebo Luo

TL;DR

The paper tackles the challenge of generating realistic objects and coherent semantic layouts in large-hole guided image completion. It introduces semantic discriminators that leverage pretrained visual features and object-level discriminators operating on aligned object crops, integrated with a CM-GAN generator. The approach achieves state-of-the-art results on segmentation-, edge-, and panoptic-guided inpainting on Places2 and COCO-Stuff, and enables a fully automatic pipeline for standard inpainting that predicts panoptic layouts inside missing regions. This yields flexible image editing capabilities, faster inference than diffusion-based methods, and substantially improved realism of complex scenes.

Abstract

Structure-guided image completion aims to inpaint a local region of an image according to an input guidance map from users. While such a task enables many practical applications for interactive editing, existing methods often struggle to hallucinate realistic object instances in complex natural scenes. Such a limitation is partially due to the lack of semantic-level constraints inside the hole region as well as the lack of a mechanism to enforce realistic object generation. In this work, we propose a learning paradigm that consists of semantic discriminators and object-level discriminators for improving the generation of complex semantics and objects. Specifically, the semantic discriminators leverage pretrained visual features to improve the realism of the generated visual concepts. Moreover, the object-level discriminators take aligned instances as inputs to enforce the realism of individual objects. Our proposed scheme significantly improves the generation quality and achieves state-of-the-art results on various tasks, including segmentation-guided completion, edge-guided manipulation and panoptically-guided manipulation on Places2 datasets. Furthermore, our trained model is flexible and can support multiple editing use cases, such as object insertion, replacement, removal and standard inpainting. In particular, our trained model combined with a novel automatic image completion pipeline achieves state-of-the-art results on the standard inpainting task.

Structure-Guided Image Completion with Image-level and Object-level Semantic Discriminators

TL;DR

The paper tackles the challenge of generating realistic objects and coherent semantic layouts in large-hole guided image completion. It introduces semantic discriminators that leverage pretrained visual features and object-level discriminators operating on aligned object crops, integrated with a CM-GAN generator. The approach achieves state-of-the-art results on segmentation-, edge-, and panoptic-guided inpainting on Places2 and COCO-Stuff, and enables a fully automatic pipeline for standard inpainting that predicts panoptic layouts inside missing regions. This yields flexible image editing capabilities, faster inference than diffusion-based methods, and substantially improved realism of complex scenes.

Abstract

Structure-guided image completion aims to inpaint a local region of an image according to an input guidance map from users. While such a task enables many practical applications for interactive editing, existing methods often struggle to hallucinate realistic object instances in complex natural scenes. Such a limitation is partially due to the lack of semantic-level constraints inside the hole region as well as the lack of a mechanism to enforce realistic object generation. In this work, we propose a learning paradigm that consists of semantic discriminators and object-level discriminators for improving the generation of complex semantics and objects. Specifically, the semantic discriminators leverage pretrained visual features to improve the realism of the generated visual concepts. Moreover, the object-level discriminators take aligned instances as inputs to enforce the realism of individual objects. Our proposed scheme significantly improves the generation quality and achieves state-of-the-art results on various tasks, including segmentation-guided completion, edge-guided manipulation and panoptically-guided manipulation on Places2 datasets. Furthermore, our trained model is flexible and can support multiple editing use cases, such as object insertion, replacement, removal and standard inpainting. In particular, our trained model combined with a novel automatic image completion pipeline achieves state-of-the-art results on the standard inpainting task.
Paper Structure (23 sections, 5 equations, 17 figures, 8 tables)

This paper contains 23 sections, 5 equations, 17 figures, 8 tables.

Figures (17)

  • Figure 1: We propose a new structure-guided image completion model that leverages our proposed semantic discriminators and object-level discriminators for photo-realistic image completion. Our trained guided completion model enables multiple image editing applications, including local image manipulation (left), person anonymization (middle), layout manipulation (right) and instances removal (right).
  • Figure 2: Left: Our model can take an edge/segmentation/panoptic map as condition for guided image completion and leverages a combination of vanilla StyleGAN discriminator stylegan2 and the proposed semantic discriminators at both image level and object level to enforce semantic and object coherency. The object-level discriminators take the resized object crop as inputs to enforce realism of object instances. Right: the semantic discriminators leverage the semantic knowledge of the pretrained CLIP clip model to enforce the realism of generated semantic.
  • Figure 3: The image-level semantic discriminator and the object-level discriminators progressively improve the photo realism of the generated image (e.g., face and body) on a guided inpainting task in comparison to the baseline trained with only the StyleGAN discriminator stylegan2.
  • Figure 4: Qualitative comparisons on the guided inpainting task on Places2-person. We compare our model against LaMa* lama, CoModGAN* comodgan, CM-GAN* cmgan and ControlNet* controlnet whereas $*$ denotes models re-trained with the additional panoptic instance segmentation condition. Best viewed by zoom-in on screen.
  • Figure 5: Qualitative comparisons on the guided inpainting task on Places2-object. We compare our model against retrained CoModGAN$*$comodgan, CM-GAN$*$cmgan whereas the $*$ symbol denotes models re-trained with the additional panoptic instance segmentation condition for guided inpainting. Best viewed by zoom-in on screen.
  • ...and 12 more figures