Table of Contents
Fetching ...

Image2StyleGAN++: How to Edit the Embedded Images?

Rameen Abdal, Yipeng Qin, Peter Wonka

TL;DR

Image2StyleGAN++ advances real-image embedding into StyleGAN by adding a Noise space optimization step to recover high-frequency details, enabling PSNR gains up to ~45 dB. It extends the W+ latent embedding with local masks and layer-specific constraints, allowing partial or approximate embeddings and editable regions. By combining embedding with activation-tensor manipulations (spatial, channel-wise, and averaging operations), the framework supports high-quality local edits alongside global semantic transformations, enabling applications like image reconstruction, inpainting, crossover, local scribble edits, local style transfer, and attribute-level feature transfer. The method demonstrates superior reconstruction quality, flexible local control, and broad applicability, with potential extensions to video editing in future work.

Abstract

We propose Image2StyleGAN++, a flexible image editing framework with many applications. Our framework extends the recent Image2StyleGAN in three ways. First, we introduce noise optimization as a complement to the $W^+$ latent space embedding. Our noise optimization can restore high-frequency features in images and thus significantly improves the quality of reconstructed images, e.g. a big increase of PSNR from 20 dB to 45 dB. Second, we extend the global $W^+$ latent space embedding to enable local embeddings. Third, we combine embedding with activation tensor manipulation to perform high-quality local edits along with global semantic edits on images. Such edits motivate various high-quality image editing applications, e.g. image reconstruction, image inpainting, image crossover, local style transfer, image editing using scribbles, and attribute level feature transfer. Examples of the edited images are shown across the paper for visual inspection.

Image2StyleGAN++: How to Edit the Embedded Images?

TL;DR

Image2StyleGAN++ advances real-image embedding into StyleGAN by adding a Noise space optimization step to recover high-frequency details, enabling PSNR gains up to ~45 dB. It extends the W+ latent embedding with local masks and layer-specific constraints, allowing partial or approximate embeddings and editable regions. By combining embedding with activation-tensor manipulations (spatial, channel-wise, and averaging operations), the framework supports high-quality local edits alongside global semantic transformations, enabling applications like image reconstruction, inpainting, crossover, local scribble edits, local style transfer, and attribute-level feature transfer. The method demonstrates superior reconstruction quality, flexible local control, and broad applicability, with potential extensions to video editing in future work.

Abstract

We propose Image2StyleGAN++, a flexible image editing framework with many applications. Our framework extends the recent Image2StyleGAN in three ways. First, we introduce noise optimization as a complement to the latent space embedding. Our noise optimization can restore high-frequency features in images and thus significantly improves the quality of reconstructed images, e.g. a big increase of PSNR from 20 dB to 45 dB. Second, we extend the global latent space embedding to enable local embeddings. Third, we combine embedding with activation tensor manipulation to perform high-quality local edits along with global semantic edits on images. Such edits motivate various high-quality image editing applications, e.g. image reconstruction, image inpainting, image crossover, local style transfer, image editing using scribbles, and attribute level feature transfer. Examples of the edited images are shown across the paper for visual inspection.

Paper Structure

This paper contains 25 sections, 5 equations, 19 figures, 3 tables, 6 algorithms.

Figures (19)

  • Figure 1: (a) and (b): input images; (c): the "two-face" generated by naively copying the left half from (a) and the right half from (b); (d): the "two-face" generated by our Image2StyleGAN++ framework.
  • Figure 2: Joint optimization. (a): target image; (b): image embedded by jointly optimizing $w$ and $n$ using perceptual and pixel-wise MSE loss; (c): image embedded by jointly optimizing $w$ and $n$ using the pixel-wise MSE loss only; (d): the result of the previous column with $n$ resampled; (e): image embedded by jointly optimizing $w$ and $n$ using perceptual and pixel-wise MSE loss for $w$ and pixel-wise MSE loss for $n$.
  • Figure 3:
  • Figure 4: First column: original image; Second column: image embedded in $W^{+}$ Space (PSNR 19 to 22 dB); Third column: image embedded in $W^{+}$ and Noise space (PSNR 39 to 45 dB).
  • Figure 5: First and second column: input image; Third column: image generated by naively copying the left half from the first image and the right half from the second image; Fourth column: image generated by our extended embedding algorithm.
  • ...and 14 more figures