Table of Contents
Fetching ...

Edit One for All: Interactive Batch Image Editing

Thao Nguyen, Utkarsh Ojha, Yuheng Li, Haotian Liu, Yong Jae Lee

TL;DR

Interactive batch image editing addresses transferring a user-specified edit from one example image to a set of unseen images using StyleGAN2 latent space. It learns a globally consistent latent-direction $\Delta^{*}_w$ and per-image strength $\alpha_i$ so all edited outputs converge to the same final state, enabling fast, consistent batch edits. It achieves visual quality on par with single-image editing baselines while substantially reducing manual annotation and time, and generalizes across domains like faces, animals, and bodies. The approach frames the problem geometrically in latent space via a semantic hyperplane and indicates potential extension to diffusion models in future work.

Abstract

In recent years, image editing has advanced remarkably. With increased human control, it is now possible to edit an image in a plethora of ways; from specifying in text what we want to change, to straight up dragging the contents of the image in an interactive point-based manner. However, most of the focus has remained on editing single images at a time. Whether and how we can simultaneously edit large batches of images has remained understudied. With the goal of minimizing human supervision in the editing process, this paper presents a novel method for interactive batch image editing using StyleGAN as the medium. Given an edit specified by users in an example image (e.g., make the face frontal), our method can automatically transfer that edit to other test images, so that regardless of their initial state (pose), they all arrive at the same final state (e.g., all facing front). Extensive experiments demonstrate that edits performed using our method have similar visual quality to existing single-image-editing methods, while having more visual consistency and saving significant time and human effort.

Edit One for All: Interactive Batch Image Editing

TL;DR

Interactive batch image editing addresses transferring a user-specified edit from one example image to a set of unseen images using StyleGAN2 latent space. It learns a globally consistent latent-direction and per-image strength so all edited outputs converge to the same final state, enabling fast, consistent batch edits. It achieves visual quality on par with single-image editing baselines while substantially reducing manual annotation and time, and generalizes across domains like faces, animals, and bodies. The approach frames the problem geometrically in latent space via a semantic hyperplane and indicates potential extension to diffusion models in future work.

Abstract

In recent years, image editing has advanced remarkably. With increased human control, it is now possible to edit an image in a plethora of ways; from specifying in text what we want to change, to straight up dragging the contents of the image in an interactive point-based manner. However, most of the focus has remained on editing single images at a time. Whether and how we can simultaneously edit large batches of images has remained understudied. With the goal of minimizing human supervision in the editing process, this paper presents a novel method for interactive batch image editing using StyleGAN as the medium. Given an edit specified by users in an example image (e.g., make the face frontal), our method can automatically transfer that edit to other test images, so that regardless of their initial state (pose), they all arrive at the same final state (e.g., all facing front). Extensive experiments demonstrate that edits performed using our method have similar visual quality to existing single-image-editing methods, while having more visual consistency and saving significant time and human effort.
Paper Structure (23 sections, 5 equations, 20 figures, 2 tables)

This paper contains 23 sections, 5 equations, 20 figures, 2 tables.

Figures (20)

  • Figure 1: Interactive Batch Image Editing. Given a single user edited image, the goal is to automatically transfer that edit to new unseen images, so that all edited images end up with the same final state as the user edited image.
  • Figure 2: Single Image Editing vs. Batch Image Editing. (a) Prior work (e.g., DragGANDragonDiffusionUserControllableLT) focuses on single image editing. (b) We focus on batch image editing, where the user's edit on a single image is automatically transferred to new images, so that they all arrive at the same final state regardless of their initial starting state. In this way, we can achieve time speed up and reduce human effort in editing.
  • Figure 3: Different editing strategies. (a) Setting. (b) Naive Approach: The editing direction effective for an example may not generalize well to test images. (c) Optimizing Editing Direction: We optimize for a globally consistent direction that is effective for both example and test images. (d) Adjusting Editing Strength: Ensuring consistent final states requires adjusting the editing strength for each test image.
  • Figure 4: Qualitative comparisons between dragging baselines. For ours, green bounding box indicates automatic transfer from the red bounding box example in the first row (i.e., no point annotation needed!).
  • Figure 5: Qualitative comparisons to text-guided baselines. Ours transfers the edit from example (red), to other test images (green).
  • ...and 15 more figures