Table of Contents
Fetching ...

Generative Visual Manipulation on the Natural Image Manifold

Jun-Yan Zhu, Philipp Krähenbühl, Eli Shechtman, Alexei A. Efros

TL;DR

The paper introduces a framework for realistic image editing by learning the natural image manifold with a GAN and constraining user edits to stay on this manifold. It projects real images into the GAN latent space, applies gradient-based, constraint-driven edits, and transfers those changes back to high-resolution originals via dense motion-color Flow. The approach enables three capabilities: realistic photo manipulation, generative transformation between images, and image generation from user scribbles, all with near-real-time interaction. Experimental results demonstrate improved reconstruction and realism, while acknowledging limitations related to resolution and dataset specificity, pointing to future improvements with advances in generative models.

Abstract

Realistic image manipulation is challenging because it requires modifying the image appearance in a user-controlled way, while preserving the realism of the result. Unless the user has considerable artistic skill, it is easy to "fall off" the manifold of natural images while editing. In this paper, we propose to learn the natural image manifold directly from data using a generative adversarial neural network. We then define a class of image editing operations, and constrain their output to lie on that learned manifold at all times. The model automatically adjusts the output keeping all edits as realistic as possible. All our manipulations are expressed in terms of constrained optimization and are applied in near-real time. We evaluate our algorithm on the task of realistic photo manipulation of shape and color. The presented method can further be used for changing one image to look like the other, as well as generating novel imagery from scratch based on user's scribbles.

Generative Visual Manipulation on the Natural Image Manifold

TL;DR

The paper introduces a framework for realistic image editing by learning the natural image manifold with a GAN and constraining user edits to stay on this manifold. It projects real images into the GAN latent space, applies gradient-based, constraint-driven edits, and transfers those changes back to high-resolution originals via dense motion-color Flow. The approach enables three capabilities: realistic photo manipulation, generative transformation between images, and image generation from user scribbles, all with near-real-time interaction. Experimental results demonstrate improved reconstruction and realism, while acknowledging limitations related to resolution and dataset specificity, pointing to future improvements with advances in generative models.

Abstract

Realistic image manipulation is challenging because it requires modifying the image appearance in a user-controlled way, while preserving the realism of the result. Unless the user has considerable artistic skill, it is easy to "fall off" the manifold of natural images while editing. In this paper, we propose to learn the natural image manifold directly from data using a generative adversarial neural network. We then define a class of image editing operations, and constrain their output to lie on that learned manifold at all times. The model automatically adjusts the output keeping all edits as realistic as possible. All our manipulations are expressed in terms of constrained optimization and are applied in near-real time. We evaluate our algorithm on the task of realistic photo manipulation of shape and color. The presented method can further be used for changing one image to look like the other, as well as generating novel imagery from scratch based on user's scribbles.

Paper Structure

This paper contains 16 sections, 6 equations, 8 figures, 1 table.

Figures (8)

  • Figure 1: We use generative adversarial networks (GAN) goodfellow2014generativeradford2015unsupervised to perform image editing on the natural image manifold. We first project an original photo (a) onto a low-dimensional latent vector representation (b) by regenerating it using GAN. We then modify the color and shape of the generated image (d) using various brush tools (c) (for example, dragging the top of the shoe). Finally, we apply the same amount of geometric and color changes to the original photo to achieve the final result (e). See interactive image editing demo on https://www.youtube.com/watch?v=9c4z6YsBGQ0.
  • Figure 2: GAN as a manifold approximation. (a) Randomly generated examples from a GAN, trained on the shirts dataset; (b) random jittering: each row shows a random sample from a GAN (the first one at the left), and its variants produced by adding Gaussian noise to $z$ in the latent space; (c) interpolation: each row shows two randomly generated images (first and last), and their smooth interpolations in the latent space.
  • Figure 3: Projecting real photos onto the image manifold using GAN. Top row: original photos (from handbag dataset); 2nd row: reconstruction using optimization-based method; 3rd row: reconstruction via learned deep encoder $P$; bottom row: reconstruction using the hybrid method (ours). We show the reconstruction loss below each image.
  • Figure 4: Updating latent vector given user edits. (a) Evolving user constraint $v_g$ (black color strokes) at each update step; (b) intermediate results at each update step ($G(z_0)$ at leftmost, and $G(z_1)$ at rightmost); (c) a smooth linear interpolation in latent space between $G(z_0)$ and $G(z_1)$.
  • Figure 5: Edit transfer via Motion+Color Flow. Following user edits on the left shoe $G(z_0)$ we obtain an interpolation sequence in the generated latent space $G(z)$ (top right). We then compute the motion and color flows (right middle and bottom) between neighboring images in $G(z)$. These flows are concatenated and, as a validation, can be applied on $G(z_0)$ to obtain a close reconstruction of $G(z)$ (left middle). The bottom left row shows how the edit is transferred to the original shoe using the same concatenated flow, to obtain a sequence of edited shoes.
  • ...and 3 more figures