Table of Contents
Fetching ...

Editing in Style: Uncovering the Local Semantics of GANs

Edo Collins, Raja Bala, Bob Price, Sabine Süsstrunk

TL;DR

This work reveals that StyleGAN learns spatially disentangled semantic objects and parts in its latent space, enabling local, semantically aware edits without external supervision. It introduces a ROI-guided style-transfer mechanism that transfers appearance from a reference image by conditioning style interpolation with a diagonal query matrix, leveraging a semantic cluster catalog produced via spherical k-means. Quantitative and qualitative evaluations on FFHQ, LSUN-Bedrooms, and StyleGAN2 demonstrate localized edits that preserve photorealism, outperforming naive blending methods in locality. The approach offers a practical route to versatile image editing with potential extensions to real-image editing through latent-space embedding.

Abstract

While the quality of GAN image synthesis has improved tremendously in recent years, our ability to control and condition the output is still limited. Focusing on StyleGAN, we introduce a simple and effective method for making local, semantically-aware edits to a target output image. This is accomplished by borrowing elements from a source image, also a GAN output, via a novel manipulation of style vectors. Our method requires neither supervision from an external model, nor involves complex spatial morphing operations. Instead, it relies on the emergent disentanglement of semantic objects that is learned by StyleGAN during its training. Semantic editing is demonstrated on GANs producing human faces, indoor scenes, cats, and cars. We measure the locality and photorealism of the edits produced by our method, and find that it accomplishes both.

Editing in Style: Uncovering the Local Semantics of GANs

TL;DR

This work reveals that StyleGAN learns spatially disentangled semantic objects and parts in its latent space, enabling local, semantically aware edits without external supervision. It introduces a ROI-guided style-transfer mechanism that transfers appearance from a reference image by conditioning style interpolation with a diagonal query matrix, leveraging a semantic cluster catalog produced via spherical k-means. Quantitative and qualitative evaluations on FFHQ, LSUN-Bedrooms, and StyleGAN2 demonstrate localized edits that preserve photorealism, outperforming naive blending methods in locality. The approach offers a practical route to versatile image editing with potential extensions to real-image editing through latent-space embedding.

Abstract

While the quality of GAN image synthesis has improved tremendously in recent years, our ability to control and condition the output is still limited. Focusing on StyleGAN, we introduce a simple and effective method for making local, semantically-aware edits to a target output image. This is accomplished by borrowing elements from a source image, also a GAN output, via a novel manipulation of style vectors. Our method requires neither supervision from an external model, nor involves complex spatial morphing operations. Instead, it relies on the emergent disentanglement of semantic objects that is learned by StyleGAN during its training. Semantic editing is demonstrated on GANs producing human faces, indoor scenes, cats, and cars. We measure the locality and photorealism of the edits produced by our method, and find that it accomplishes both.

Paper Structure

This paper contains 19 sections, 5 equations, 19 figures, 1 table.

Figures (19)

  • Figure 1: Applying k-means to the hidden layer activations of the StyleGAN generator reveals a decomposition of the generated output into semantic objects and object-parts.
  • Figure 2: Our method localizes the edit made to the target image (top left) by conditioning the style transfer from the reference (top row) on a specific object of interest (left column). This gives users fine control over the appearance of objects in the synthesized images. Best viewed enlarged on screen.
  • Figure 3: Unlike previous blending methodsperez2003poissonsuzuki2018spatially, our method does not require images to be aligned or of similar scale. In this case, the style of, e.g., the bed is successfully transferred from reference to target in spite of drastic changes in view point.
  • Figure 4: Even the well aligned FFHQ-generated faces prove challenging for existing blending methods, as they do not consider differences in pose and scale, and lack any notion of semantics or photorealism. In contrast, our method makes use of the correlation GANs learn from real data to maintain a natural appearance, while exploiting feature disentanglement for effectively localizing the change.
  • Figure 5: Our method applied to StyleGAN2 outputs. Photorealism is preserved while allowing fine control over highly-localized regions.
  • ...and 14 more figures