Table of Contents
Fetching ...

Repositioning the Subject within Image

Yikai Wang, Chenjie Cao, Ke Fan, Qiaole Dong, Yifan Li, Xiangyang Xue, Yanwei Fu

TL;DR

This research reveals that the fundamental sub-tasks of subject repositioning can be effectively reformulated as a unified, prompt-guided inpainting task, and forms the basis of the SEgment-gEnerate-and-bLEnd (SEELE) framework.

Abstract

Current image manipulation primarily centers on static manipulation, such as replacing specific regions within an image or altering its overall style. In this paper, we introduce an innovative dynamic manipulation task, subject repositioning. This task involves relocating a user-specified subject to a desired position while preserving the image's fidelity. Our research reveals that the fundamental sub-tasks of subject repositioning, which include filling the void left by the repositioned subject, reconstructing obscured portions of the subject and blending the subject to be consistent with surrounding areas, can be effectively reformulated as a unified, prompt-guided inpainting task. Consequently, we can employ a single diffusion generative model to address these sub-tasks using various task prompts learned through our proposed task inversion technique. Additionally, we integrate pre-processing and post-processing techniques to further enhance the quality of subject repositioning. These elements together form our SEgment-gEnerate-and-bLEnd (SEELE) framework. To assess SEELE's effectiveness in subject repositioning, we assemble a real-world subject repositioning dataset called ReS. Results of SEELE on ReS demonstrate its efficacy. Code and ReS dataset are available at https://yikai-wang.github.io/seele/.

Repositioning the Subject within Image

TL;DR

This research reveals that the fundamental sub-tasks of subject repositioning can be effectively reformulated as a unified, prompt-guided inpainting task, and forms the basis of the SEgment-gEnerate-and-bLEnd (SEELE) framework.

Abstract

Current image manipulation primarily centers on static manipulation, such as replacing specific regions within an image or altering its overall style. In this paper, we introduce an innovative dynamic manipulation task, subject repositioning. This task involves relocating a user-specified subject to a desired position while preserving the image's fidelity. Our research reveals that the fundamental sub-tasks of subject repositioning, which include filling the void left by the repositioned subject, reconstructing obscured portions of the subject and blending the subject to be consistent with surrounding areas, can be effectively reformulated as a unified, prompt-guided inpainting task. Consequently, we can employ a single diffusion generative model to address these sub-tasks using various task prompts learned through our proposed task inversion technique. Additionally, we integrate pre-processing and post-processing techniques to further enhance the quality of subject repositioning. These elements together form our SEgment-gEnerate-and-bLEnd (SEELE) framework. To assess SEELE's effectiveness in subject repositioning, we assemble a real-world subject repositioning dataset called ReS. Results of SEELE on ReS demonstrate its efficacy. Code and ReS dataset are available at https://yikai-wang.github.io/seele/.
Paper Structure (15 sections, 2 equations, 13 figures, 3 tables)

This paper contains 15 sections, 2 equations, 13 figures, 3 tables.

Figures (13)

  • Figure 1: We compare subject repositioning using our SEELE with Google's Magic Editor. SEELE effectively addresses tasks like subject removal, completion, and harmonization through a unified prompt-guided inpainting process, powered by a single diffusion model. Comprehensive results are depicted in Figure \ref{['fig:high-res-small']}.
  • Figure 2: SEELE for SubRep includes i) pre-processing: identifying the subject via user-provided conditions, and preserving occlusion relationships between subjects; ii) manipulation: filling gaps left in the image and corrects obscured subjects with user-specified incomplete masks; iii) post-processing: addressing disparities between the repositioned subject and its new surroundings. SEELE addresses all generative sub-tasks in SubRep via a single diffusion model. In this example, only local harmonization is used in postprocessing. See shadow generation results in Figure \ref{['fig:ablation']}.
  • Figure 3: (a) User inputs in each stage of SubRep. (b) Examples of Res dataset. We provide paired images with subject full and visible mask annotations as well as moving direction information. The moving direction is marked as blue. The mask of visible part and completed subject specified by user are marked as orange.
  • Figure 4: (a) Comparison between task inversion and other techniques. Task inversion does not require text inputs, addresses different objectives, and serves different tasks, thus differing from other approaches. The embeddings $v_*$ and $v_{*i}$ are learnable and represented as $\bm{z}$ in Eq. (\ref{['eq:loss']}). (b) We generate masks to represent particular tasks to train task inversion, addressing different tasks with a single diffusion model.
  • Figure 5: Subject repositioning on $1024^2$ images. SEELE works well on diverse scenarios, enabling flexible repositioning, and achieves high-fidelity repositioned images. Larger version in Figure \ref{['fig:high-res']}.
  • ...and 8 more figures