Table of Contents
Fetching ...

Improving Inversion and Generation Diversity in StyleGAN using a Gaussianized Latent Space

Jonas Wulff, Antonio Torralba

TL;DR

The paper tackles the instability of inverting and interpolating within StyleGAN's latent spaces by introducing a Gaussian prior on a transformed latent space. By applying a simple Leaky ReLU transformation to obtain a Gaussian-distributed latent, the authors enable an analytic prior described by a mean and covariance, which improves inversion quality and smoothness of interpolations for both $\mathcal{W}$ and $\mathcal{W}^{+}$. They further exploit this Gaussian model to analyze and reduce generator artifacts via PCA and a logarithmic compression of dominant components, offering an alternative to truncation that preserves diversity. Overall, the work provides a principled framework for stable inversion and artifact mitigation in high-fidelity GANs with practical impact on editing and dataset generation.

Abstract

Modern Generative Adversarial Networks are capable of creating artificial, photorealistic images from latent vectors living in a low-dimensional learned latent space. It has been shown that a wide range of images can be projected into this space, including images outside of the domain that the generator was trained on. However, while in this case the generator reproduces the pixels and textures of the images, the reconstructed latent vectors are unstable and small perturbations result in significant image distortions. In this work, we propose to explicitly model the data distribution in latent space. We show that, under a simple nonlinear operation, the data distribution can be modeled as Gaussian and therefore expressed using sufficient statistics. This yields a simple Gaussian prior, which we use to regularize the projection of images into the latent space. The resulting projections lie in smoother and better behaved regions of the latent space, as shown using interpolation performance for both real and generated images. Furthermore, the Gaussian model of the distribution in latent space allows us to investigate the origins of artifacts in the generator output, and provides a method for reducing these artifacts while maintaining diversity of the generated images.

Improving Inversion and Generation Diversity in StyleGAN using a Gaussianized Latent Space

TL;DR

The paper tackles the instability of inverting and interpolating within StyleGAN's latent spaces by introducing a Gaussian prior on a transformed latent space. By applying a simple Leaky ReLU transformation to obtain a Gaussian-distributed latent, the authors enable an analytic prior described by a mean and covariance, which improves inversion quality and smoothness of interpolations for both and . They further exploit this Gaussian model to analyze and reduce generator artifacts via PCA and a logarithmic compression of dominant components, offering an alternative to truncation that preserves diversity. Overall, the work provides a principled framework for stable inversion and artifact mitigation in high-fidelity GANs with practical impact on editing and dataset generation.

Abstract

Modern Generative Adversarial Networks are capable of creating artificial, photorealistic images from latent vectors living in a low-dimensional learned latent space. It has been shown that a wide range of images can be projected into this space, including images outside of the domain that the generator was trained on. However, while in this case the generator reproduces the pixels and textures of the images, the reconstructed latent vectors are unstable and small perturbations result in significant image distortions. In this work, we propose to explicitly model the data distribution in latent space. We show that, under a simple nonlinear operation, the data distribution can be modeled as Gaussian and therefore expressed using sufficient statistics. This yields a simple Gaussian prior, which we use to regularize the projection of images into the latent space. The resulting projections lie in smoother and better behaved regions of the latent space, as shown using interpolation performance for both real and generated images. Furthermore, the Gaussian model of the distribution in latent space allows us to investigate the origins of artifacts in the generator output, and provides a method for reducing these artifacts while maintaining diversity of the generated images.

Paper Structure

This paper contains 16 sections, 6 equations, 12 figures, 3 tables.

Figures (12)

  • Figure 1: Statistics on $\mathcal{W}$ and $\mathcal{V}$. Marginal (a) and pairwise (b) distributions of $w$ are highly irregular. After mapping into $\mathcal{V}$, the marginal (c) and pairwise (d) distributions show that the data can be well modeled as a high dimensional Gaussian. All plots are centered at $0$ and show the same range.
  • Figure 2: Average image and latent reconstruction errors. Adding the prior (dashed lines) helps both when reconstructing to $\mathcal{W}$ (top, red) and $\mathcal{W^{+}}$ (bottom, blue). Note that we match the model to the data, i.e. the images in (a) were generated using a single style, and the images in (b) were generated using different styles on different scales.
  • Figure 3: Example interpolations between inversions of real images to $\mathcal{W}^{+}$. Without a prior (top row in each example), the latents often fall into poorer areas of the latent space, causing distorted appearances in the intermediate images. Using a prior (bottom rows) encourages the latent to lie in good regions, causing the intermediate images to be more realistic. The left and right columns show the start and end frames, respectively. Please see the appendix for additional examples.
  • Figure 4: Comparison of different correction methods. Raw samples (a) are commonly corrected using truncation (b), which removes artifacts, but also reduces diversity, as can be seen from the sharper average image and lower per-pixel standard deviations in (b). Our method (c) reduces artifacts, while maintaining a high degree of diversity.
  • Figure 5: Principal components of images with and without artifacts. Images with artifacts exhibit significantly larger magnitudes in the low principal components than good images; the dashed lines indicate one standard deviation.
  • ...and 7 more figures