Table of Contents
Fetching ...

Drag Your GAN: Interactive Point-based Manipulation on the Generative Image Manifold

Xingang Pan, Ayush Tewari, Thomas Leimkühler, Lingjie Liu, Abhimitra Meka, Christian Theobalt

TL;DR

This work tackles the challenge of flexible, precise, and general controllability in GAN-based image synthesis. It introduces DragGAN, an interactive framework that drags user-defined handle points toward target points using motion supervision and GAN-feature-based point tracking, enabling fine-grained edits across diverse categories. The method operates directly on the GAN image manifold, supports region masking, and achieves real-time-like interactivity with no extra tracking networks. Empirical results show quantitative and qualitative advantages over prior approaches, and real-image editing is demonstrated via GAN inversion, highlighting practical impact for content creation and editing workflows.

Abstract

Synthesizing visual content that meets users' needs often requires flexible and precise controllability of the pose, shape, expression, and layout of the generated objects. Existing approaches gain controllability of generative adversarial networks (GANs) via manually annotated training data or a prior 3D model, which often lack flexibility, precision, and generality. In this work, we study a powerful yet much less explored way of controlling GANs, that is, to "drag" any points of the image to precisely reach target points in a user-interactive manner, as shown in Fig.1. To achieve this, we propose DragGAN, which consists of two main components: 1) a feature-based motion supervision that drives the handle point to move towards the target position, and 2) a new point tracking approach that leverages the discriminative generator features to keep localizing the position of the handle points. Through DragGAN, anyone can deform an image with precise control over where pixels go, thus manipulating the pose, shape, expression, and layout of diverse categories such as animals, cars, humans, landscapes, etc. As these manipulations are performed on the learned generative image manifold of a GAN, they tend to produce realistic outputs even for challenging scenarios such as hallucinating occluded content and deforming shapes that consistently follow the object's rigidity. Both qualitative and quantitative comparisons demonstrate the advantage of DragGAN over prior approaches in the tasks of image manipulation and point tracking. We also showcase the manipulation of real images through GAN inversion.

Drag Your GAN: Interactive Point-based Manipulation on the Generative Image Manifold

TL;DR

This work tackles the challenge of flexible, precise, and general controllability in GAN-based image synthesis. It introduces DragGAN, an interactive framework that drags user-defined handle points toward target points using motion supervision and GAN-feature-based point tracking, enabling fine-grained edits across diverse categories. The method operates directly on the GAN image manifold, supports region masking, and achieves real-time-like interactivity with no extra tracking networks. Empirical results show quantitative and qualitative advantages over prior approaches, and real-image editing is demonstrated via GAN inversion, highlighting practical impact for content creation and editing workflows.

Abstract

Synthesizing visual content that meets users' needs often requires flexible and precise controllability of the pose, shape, expression, and layout of the generated objects. Existing approaches gain controllability of generative adversarial networks (GANs) via manually annotated training data or a prior 3D model, which often lack flexibility, precision, and generality. In this work, we study a powerful yet much less explored way of controlling GANs, that is, to "drag" any points of the image to precisely reach target points in a user-interactive manner, as shown in Fig.1. To achieve this, we propose DragGAN, which consists of two main components: 1) a feature-based motion supervision that drives the handle point to move towards the target position, and 2) a new point tracking approach that leverages the discriminative generator features to keep localizing the position of the handle points. Through DragGAN, anyone can deform an image with precise control over where pixels go, thus manipulating the pose, shape, expression, and layout of diverse categories such as animals, cars, humans, landscapes, etc. As these manipulations are performed on the learned generative image manifold of a GAN, they tend to produce realistic outputs even for challenging scenarios such as hallucinating occluded content and deforming shapes that consistently follow the object's rigidity. Both qualitative and quantitative comparisons demonstrate the advantage of DragGAN over prior approaches in the tasks of image manipulation and point tracking. We also showcase the manipulation of real images through GAN inversion.
Paper Structure (31 sections, 2 equations, 15 figures, 4 tables)

This paper contains 31 sections, 2 equations, 15 figures, 4 tables.

Figures (15)

  • Figure 1: Overview of our pipeline. Given a GAN-generated image, the user only needs to set several handle points (red dots), target points (blue dots), and optionally a mask denoting the movable region during editing (brighter area). Our approach iteratively performs motion supervision (Sec. \ref{['sec:motion_supervision']}) and point tracking (Sec. \ref{['sec:point_tracking']}). The motion supervision step drives the handle points (red dots) to move towards the target points (blue dots) and the point tracking step updates the handle points to track the object in the image. This process continues until the handle points reach their corresponding target points.
  • Figure 2: Method. Our motion supervision is achieved via a shifted patch loss on the feature maps of the generator. We perform point tracking on the same feature space via the nearest neighbor search.
  • Figure 3: Qualitative comparison of our approach to UserControllableLT endoPG2022 on the task of moving handle points (red dots) to target points (blue dots). Our approach achieves more natural and superior results on various datasets. More examples are provided in Fig. \ref{['fig:qualitative2']}.
  • Figure 4: Real image manipulation. Given a real image, we apply GAN inversion to map it to the latent space of StyleGAN, then edit the pose, hair, shape, and expression, respectively.
  • Figure 5: Qualitative tracking comparison of our approach to RAFT teed2020raft, PIPs harley2022particle, and without tracking. Our approach tracks the handle point more accurately than baselines, thus producing more precise editing.
  • ...and 10 more figures